This is how I’ve decided to approach any and all measures of performance, statistics, and various other factors. I use the line that Vegas convenes and triangulate all my data to the line. I did it with Basketball, Baseball, and now the next step is football. Here I’m focusing on College Football. Not sure I’ll even attempt to model an NFL database, the sport is a different animal. For the NFL, I’ll just stick with intuition and finding good fades on the internetz. And the fades emanate with resplendent fervor if you search the forum spectrum for CAPS LOCKS AND !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Back to College Football. First I want to find how substantial is the line itself. Meaning how does a team’s overall line compare to team wins. Since for gamblers, the spread is the only thing that matters, this may seem meaningless. But in order to appropriate a spread with precision and a marked sense of sharpness, how well a team performs over the season is highly correlated to wins and losses. Which subsequently can be expressed, with a measure of consequence, via margin of victory, using the proverbial Pythagorean Win Percentage calculation.
Having extracted data from Statfox using the methods described here and here (I can assure you, regardless of the tendency for my computer to show a personified resentment by way of heating up, because of all the work it has been given, this is still a very economical method of extraction), the next step is to serry the data in a fashion that is conducive to evaluation and analysis. And for me this involves calculating the average line for each team, then translating the line into wins using a least squares line. The relationship between average line and wins should obviously be highly correlated, and with a low margin of error, the wins formulated via where a line falls on the highly linear trend can be used as a starting point.
With the actual wins, and the linear wins from average line, there are a couple more ways to find expected season wins. A direct Pythagorean expectation, points scoredx / (points scoredx + points allowedx), and the Pythagorean formula replacing the observed points scored with the average line. (I did this with college basketball, and explained the calculation to capacity here, its pretty straight forward and obvious, average the team points scored and points allowed, and subtract or add the half the average line margin, if average line is below zero, add half the line to points scored, etc…).
The one thing with Pythagorean expectation that is imminently taken under consideration is the value 0f the exponent. Some prefer a constant using observed past data, here I used the exponent that corresponded to the lowest absolute average difference between observed and expected, moving from season to season. Different seasons induced various exponent, higher scoring environments generally relate to higher exponents, and vice versa.
In order to isolate the most ideal exponent for each season, the lowest absolute average difference (often referred to as the mean absolute difference, or MAD), is used rather than the actual difference because what I am concerned with is the error difference from actual percentage, rather than a true difference. I want everything to compare to zero. So a difference of -.05 and .05 won’t average to zero, instead both will have an absolute difference of .05. (Again I laid all the framework for this out in my college basketball dirges). There are other ways to find the best exponent, but MAD, despite its inferences, is a sound mathematical tool.
All the data from 2002-2009 were calculated with this exact framework in mind. One more thing I did in order to rate the teams from the last eight years. This will be just a slight digression, but one that I think is very meaningful and explanatory. Instead of only finding he average line, I decided to do an adjustment to opponent’s average line. I extracted every opponent’s line for each team (more VBA code), found the average line for that opponent, and subtracted the difference. Its overly simple, and a more precise formulation is probably waiting to be found, but regardless, once sorted by adjusted line, the order of the teams took on a different look, similar to what I did with Starting Pitcher Line Weight. For example, in 2009 instead of Nevada being in the top 25 in average line, once adjusted their ranking moved down to 40 or so. Same with Boise State or TCU, the teams in the second tier conferences move down a few slots after the adjustment, which is what I set out to do. The formula is direct and simple, and the result is substantial enough to warrant an inclusion of the formula into a sort of ranking system.
I’ll just show the table for 2009, but here is the average line and adjusted line top 10:
AVERAGE LINE 2009
ADJUSTED LINE 2009
(For you Hokie fans out there, VaTech was rated 11th in both with and without adjustment)
The lightish purplish pinkish (or how about just pink) cells show the descriptive statistics. Its amazing, and disgusting at the same time, that the average line for 120 NCAAF teams approaches zero. Linemakers are the least appreciated operation in terms of level of sophistication and intelligence on the market. Another thing should be mentioned is variance of the opponents line. For the most part, teams schedule neutral teams, or the schedule is such where there is an equillibrium between the degree of difficulty and the degree of cupcake. Maybe a team like Troy will schedule Georgia and Florida, and be underdogs of 20 to 30, but once they get into Sun Belt conference play their average conference line may be around 7-10 point favorites. The weight of conference games is merely showing its effect here. Because distribution of league wins (and the average line) are for the most part Gaussian (a tendency to cluster around a 50% winning percentage, or a line of zero), I can make the assumption that 2/3 of the teams schedule opponents that range from a line of 2.27 to -2.27, which is an indication of how well balanced conference play is in my opinion.
With the above data merely as a nice ranking system, and perhaps a starting point for future team metric formulations, here is all the actual vs expected win measures from 2002 top 2009 (which I mentioned before the digression, BTW I used the actual average line to find expected wins, rather than adjusted, for arbitrary reasons) sorted by adjusted line, only showing the top ten. The exponent used for each season is in the yellow header beside the year. Its pointless to wait for the 2010 season to end to find the most optimal exponent, since doing so would imply the season has already ended and team evaluations finished without exploiting the data for the sake of gambling on the teams. Judging by the table, I think an exponent of around 2.24 should suffice in Pythagorean calculations to assess team by team scenarios for the 2010 season. If you want all the data email me and we will have to work something out. Perhaps a data swap or sign up for one of the affiliates and I’ll send you some of my excel sheets. I need list of returning starters (already have 2008 and 2009) and preseason sportsbook future win totals back to 2002 (have 2009 for 47 teams).
*The Pink Cells are average for all teams for the respective seasons
*Texas and USC just beat the shit out of everybody in 2005
*Games scheduled against teams of which no line was placed on the game were excluded from win evaluations. Therefore a Team A with a higher line than Team B can still have less pythagorean line wins and linear line wins.
A brief survey of all the data demonstrates that team wins predicted by least squares fit is, for the sport as a whole, the best way to compare to actual wins, or at least the more accurate method. This is clear purely by looking at the average wins. The Pythagorean formulas seem to invariably overshoot expected wins throughout all teams on average for each season. I’ve always thought the wins founded by the linear relationship between line and wins is one of the best ways to measure how over or underrated a team is. Like I said before, this sort of model worked very well for college basketball and the NBA, even though one year success rate could be completely random fluctuation of variance, and has performed adequately enough in MLB. (Remember I was down 15x before implementing the all-encompassing starting pitcher behemoth of which I only allow a handful of people to use)
For Vegas (or nowadays Pinnacle) has a collection of the best and most sophisticated equalizers for team performance that even seems to transcend how the team performs and what the team thinks of themselves. They’ve manifested an entirely new way to judge how good teams are, though their results are largely ignored by the MSM because of the bad stigma surrounding sports gambling. Vegas knows best, always.
Tomorrow Eventually, I’ll post how the different statistical variables impact, or correlate to, the average line (yards per play, yards per point, rushing yards per attempt differential?), and perhaps find a nice and easy formula using coefficients as weights to find an expected line. Then I can start running year to year regression, how each variable extrapolates to the following season’s average line, etc…
This could (will?) allow me to disclose an ATS W/L record that might translate to an optimal level of expectation vs the observed record (again I did something similar with college basketball in trying to predict ATS records, though not nearly as involved).
Related posts:












Recent Comments