Loading...


1/2* San Diego Padres to win NL West +4000 SIA - 4* New York Jets Wins u9.5 +100 - Check the Twitter feed for MLB and WNBA guesses. Records to be updated conveniently in page labeled as "Record"

Posted NCAAF Home Field Advantage

Extracted from the 2002-2009 Seasons.

HFA is found by finding the difference between the average line and the average home line, and weighting for the number of games.

The numbers are deceiving since the HFA is contingent on the home/road schedule.  120 teams in NCAA FBS breeds an unbalanced schedule.  For example, LA Monroe plays SEC teams out of conference on the road and then welcomes their Sun Belt rivals at home during conference season.

The asymmetry is undeniable, which leaves the calculated HFA as it is not a very manageable statistic.  What I should do is constrain the data to conference games, to get a better indicator of how Vegas sees a team’s home advantage versus similar competition.

The page is here.

No Comments

NFL Line Making by Inverse Pythagorean

In trying to figure out how to create a line after extracting an initial single game win percentage from NFL Futures, I decided to approach the problem from a Pythagorean Expectation point of view.

Recently I’ve become an umbilical purveyor of Pyth.  Every calculation of betting line proceeds by way of filtering through this form of win percentage estimation process.  Which is to be expected once ruminating on possible alternatives.  A Least Squares Fitted line, a method I have entertained at length and used to great resolve, has certain weaknesses that can not be overlooked.  The most glaring being the lack of integrating a total into the formula.  In sports with lower run scoring environments, such as MLB, a simple correlation to line and wins serves as a sufficient generator of where such line would fit on a graph of the two, and vice versa. But football and basketball require a Pyth IMO.

Prior experimentation led me to the inclusion of a winning percentage calculated solely by the line and total of each team.  This is valid in any sport.  For example:

x = line, T = total line, P = winning percentage, n = exponent

P = (T-x/2)^n/{(T-x/2)^n+(T+x/2)^n}

This can be simplified by using the ratio of line differential.

Ratio = {T-x/2} / {T+x/2}

P = Ratio^n/{Ratio^n + 1}

So given a line and total, with the desired exponent, the equation produces a “Vegas” winning percentage. To take it further, set x equal to the difference between average point differential and average line, and one derives a solid indicator of expected against the spread winning percentage.

Again I’ve discussed this at length before, and if you feel compelled to search the site, feel free. It was first introduced right before the start of March Madness. But the formula above is pretty self-explanatory.

Now the challenge is taking the formula, and solving for the variable x.  In the above equation, x is assumed to be a known variable, and P is the unknown.  Finding win probability is the objective of the formula.

It takes a deep journey into the dark recesses of the withering mind to awaken the fundamentals of solving algebraic equations.  As well as rediscovering basic epistolary devices from childhood.   Notions of pencil and paper have long been usurped by more contemporary mediums of creating verbiage.

After hours of an agonizing and pitiful pursuit of overcoming the stunning handicap inured by time and modern technology, I was able to devise a formula.

I won’t bother you with the cacophony of caprice steps and circuitous routes suffered in finally solving for the variable x (for no rational mind could decipher the madness).

Here is what I came up with, probably not in its simplest form but the end product can be justified.

Ratio = (P / {1-P})^{1/n}

The result is displayed in the form of a line.

To remain consistent with the previous post on finding single game winning percentage, let’s create a hypothetical with the Bears and Cardinals (had I foresight, I would have analyzed two teams that were scheduled week 1).

Average total to integrate into the formula is based on preference. The game total could be used, or some calculation of prior years. The average of each team’s total from last year happens to fall right around 38, so 38 is eminently practical here.

Winning Percentage will be the one extracted from the future win total.

TEAM W%
ARIZONA 48.80%
CHICAGO 48.86%

Obviously the line created here is likely to be around zero. Regardless, to find the winning percentage, P, run the values through the Log 5 Formula. Then enter the values into the equations. For NFL the understood exponent is 2.37, so I’ll use that.

Ratio = ({50.06%} / {1-50.06%})^{1/2.37} approx 1

Chicago is generally given 2.55 points for HFA, which would make the hypothetical line -2.55.

That was very disengaging. It would be more practical to use an actual game.

Giants are -7 -110 at Pinnacle vs Carolina. The respective winning percentages siphoned from the regular season futures:

TEAM W%
Carolina 41.99%
NY Giants 56.53%

Running through the aforementioned calculations (using the game over/under), the line created:

Giants -5.04

HFA for New York is -2.18. Add the two and the final line is:

Giants -7.22

Virtually identical to the vigged out -7 -110 via Pinny.

If you run this operation for every week 1 matchup, the MAD is approximately 2. Which is rather remarkable.

I made an Inverse Pythagorean Calculator for convenience.

No Comments

Updated NL/AL MVP Predictor

Some slight changes in MVP position and predicted number points. The odds still have yet to be released offshore. Pujols has hit a devastating surge, he could win the triple crown. Still think Adrian Gonzalez has the best value in either league.

Keep in mind in both leagues, the players listed are all the players that registered in the predictor equation. Once some players are filtered from the equation the odds will drop. This is because my formula to create odds is simply the total number of predicted points divided by two. Logically, if you have over half the number of points one would be declared the winner

No Comments

Creating Win Probability from NFL Win Futures

NFL Win Totals are usually presented thus:

Team Wins Over Under
CHICAGO 8 108 -126
ARIZONA 7.5 -139 119

Chicago being convened with a total as an integer, and Arizona given a half point, allowing for a push rate of zero.

Simply, to come up with an expected winning percentage, one could just take the total number and divide by 16 (total regular season NFL games). This equates to an expected winning percentage for Chicago of 50%, and Arizona 46.88%.

But that’s a rather banal and uninviting number when being given the odds of each event happening. I’ve discussed how to convert odds to win probability, then from there to fair value win probability, and feel free to use my calculator to that end.

After conversion the table is reconfigured like so:

Team Wins Over Under
CHICAGO 8 46.3% 53.7%
ARIZONA 7.5 56.02% 43.98%

The table is saying the Chicago Bears have a 46.3% chance of winning more than 8 games, and 53.7% of losing less than 8 games. Similarly for the Arizona Cardinals, who have been appropriated with a 56.02% probability of going over 7.5, and 43.98% of winning 7 or fewer.

What we have now is the elements required for a binomial probability scenario. A probability of a certain number of events of one variable resulting in success or failure given a sample size, and the success probability of one single event. In this case the sample size, n is 16, and the number of successes, x is the win total. What is missing is the precise measure of success in any one game. But what is known is the answer to the cumulative binomial distribution equation. And that is the fair value odds.

One condition that demands further attention before calculating is the push probability of the 8 wins, for Chicago, that Pinnacle is showing as the win total. The Bears could very well win 8 victories, which must be accounted for. Before I proceed its necessary to remove the push probability from the equation.

I’ve introduced the framework of a binomial distribution and what that entails here, and explained to capacity, so I won’t expound further. I highly suggest not only reading my post but seeking information at Wikipedia here, and a site aimed at explaining Binomial Probability here. Both offer far more worthy and articulate explanations of the concepts involved, than what one may gather from my elaborate drivel.

To Wit:

given probability p, sample size n, number of successes x

Probability Mass Function
P(X = x)=(matrix{2}{1}{n x})p^x(1-p)^(n-x)

Cumulative Distribution Function
P(X <= x)=sum{i=0}{[x]}(matrix{2}{1}{n x})p^x(1-p)^(n-x)

For the latter, the variable i is all successes up to x. The probability that any value less than or equal to x number of successes in sample size n.

The fair value over price is the cumulative Probability, P, of winning more than x in a 16 game season. While the formula provides the “floor” of success, x, to find the other end of the spectrum subtract the resulting number from 1.

1-P(X <= x)

The former equation, for convenience I’ll refer to as PMF, is used to determine the possibility of the Bears winning 8 games, which is the push rate.  Again Pinnacle provides us with the answer to the CDF equation.  The Bears fair value over percentage is 46.30%.  The known variables:

P(X=x) = 46.30%
x= 8
n = 16

Stata has a built in function to calculate the binomial probability with the single game probability unknown, yet the answer to the formula assumed. For those afforded the luxury of having Stata at their immediate disposal, the command “invbinomialtail (n,x,P)” renders the answer to the single game probability.

Without Stata, and in avoidance to having to go through the trouble of solving the equation for the variable p, the Excel Solver add-in serves as a viable alternative.

Copy the data for the Chicago Bears into corresponding cells in excel. Calculate the winning percentage for the Bears, preferably just enter =”8/16″ in its appropriate cell. Then in a cell adjacent, place this formula:

“=1-binomdist(wins-1,16,probability,true)”

Now open solver and set the target cell to the one containing the formula above, a minimum value of 0, by changing the organic winning percentage (total / 16), and the solution constrained to the value equal to the fair value over odds.

Copy the altered winning percentage to an unoccupied cell. Enter the formula into another adjacent cell:

“=binomdist(wins,16,probability,true)”

Run solver again, except this time set the solver result equal to the fair value under odds.

Average both winning percentages derived from the above equations.

For the Bears their projected winning percentage after negotiating the operation described above is 48.86%, or 7.82 wins.  Now enter the probability, p, of 48.86% into the PMF equation, and the push rate ≈ 19.49%.

Future win totals with a half point, such as the Cardinals, are far more accessible. Little effort is required after purveying the aforementioned processed. As a way of double checking the validity of the projected single game winning percentages, the sum total of all the new win totals should be roughly 256. 256 is the maximum number of wins distributed through the course of the NFL season. And after arbitrarily giving the currently OTB Minnesota Vikings 8 wins, the sum of all the win totals is equal to 256.79. Conversely, had you not taken the push probability into consideration, the sum total would have been 234.

To speed up the calculations, I have written an excel macro to run solver equations in a loop. Solver uses absolute references, disabling the ability to use the cell range (i.e. “…Cells(a,b).Value”) in the Macro. Its an easy fix, by way of cell address and the looped variables i and k.

Just point the code below to whatever cells contain the relevant data.

Of consequence is the essence of what the concluding numbers represent. Logistically its the respective team’s average single game winning percentage over the course of the season. When comparing two teams using these win total projections, to find the expected Vegas Moneyline, just insert the team single game win probability into the Log 5 calculator.

From there one can take it a step further and find the expected line based on some of the newly constructed information, which entails a slight inversion of the Pythag formula. I’ll explore those contingencies at a later date.

1 Comment

NL and AL Cy Young


The requisites for establishing a reasonable prediction of the likely Cy Young candidates were similar to what I did with the MVP. Of course different statistics needed to be applied, but the essence of the framework remained the same. Find numbers of consequence and regress.

But it doesn’t take long for one to notice, after an inventory of the candidates of years prior, that the appropriation of voting points to starters as well as closers permits the process to be negotiated via arbitrary re-configuration of statistics. This means, since closers are measured by accumulating saves, an equalizer of saves and wins serves the purpose of adjoining closers and starters into the same regression. Relievers that are not closers are S-O-L. I can assure you, with a fair amount of certainty, a middle reliever will not win the Cy Young.

The aforementioned requires some degree of mathematical ingenuity. The term arbitrary finds itself useful in describing such a process of ingenuity, in addition to being thrown around in describing similar applications. I find it to be a very flexible word of choice. I don’t think it presumptuous to denote the term as having, uniquely, a universal privilege for relating the vicissitudes of the day. When you walk to some particular location, the steps you take are not so much planned as they are the result of having authority over the steps you take for the convenience of arriving at the desired objective. Therefore the route can be considered arbitrary, subjected to the judgment of the walk-man.

Arbitrary in mathematics refers to a constant with an undetermined value. Therefore a formula can determine the value, and act as the constant itself.

During the MVP regression process, I felt it practical to provide a certain weight to players playing on teams that make the playoffs (since a playoff appearance by a player has a positive correlation to voting points), by adding the square root of wins to itself, producing the formula:

Given team wins (TW), solving for weighted team wins (TWx)

TWx = sqrt{TW}+TW

The results served the purpose of adequacy, and I didn’t feel it necessary to experiment further. Obviously the weight will be incommensurate with the reality of the situation, though not beyond the risk of ineffably skewing the odds. I did not much deliberate on the matter simply because the resulting odds appeared to be agreeable to reason.

To proceed with creating a formula to assimilate starters and closers into one equation, I partitioned the two types of pitchers, and found the correlations. Closers are well represented in the voting practices of the writers, for both leagues.

I was initially surprised to find out team wins, and playoff appearance had a negative correlation to voting points. A playoff appearance cost a player around six points. But my vexation was swiftly addressed and pacified through recall of some of the past winners. Call it the Roy Halladay effect, or concomitantly, the Lincecum effect.

More in line with reason, ERA and WAR were highly correlated to the trends of the voters, with starting pitchers and closers. As well, saves were treated as consequential with closers, and player wins with starting pitchers.

However, that is basically it. Unexpectedly, strikeouts, strikeout ratio, and strikeout/walk ratio created very little substance in the evaluation of positive or negative relationships. FIP and other advanced pitching metrics are yet seen to be invidious creations of new age sabremetricians, and the old habits of the writers persist. (Though with the recent influx of some prominent sabremetricians into the BBWAA, hard-core baseball statistics and advanced metrics may usurp the incapacity of conventional numbers)

Now what is left is a painstaking gap to fill, that of saves and wins. How can the two reconcile to prescribe to the limits of sample size?

Left to the devices of my own ingenuity, it didn’t take but seven to ten minutes of concerted thought to find resolve. And using the fundamentals of the simple formula concocted from the MVP regression, demonstrated above, I merely jutted the basics until my humble vanity was content.

Wins (W), Saves (SV), solving for SVW (Saves to Wins)

SVW = (sqrt{SV}/2)^2+W

Closer wins are putatively rudiments of blown saves, so more or less that is a neutral statistic. One major difference, on this occasion, I decided the conclusion should be instantiated to at least some end and with rationale. And I retrofitted the formula to a formidable interval of average wins. I simply averaged the wins of starters, and then found averaged wins with closers after implementing the formula. I did this for both leagues

AL:
SP average wins = 18.18
CP average wins = 15.87

NL:
SP average wins = 17.83
CP average wins = 14.23

The re-adjustment of saves to wins naturally combats the asymmetry between SP and CP ERA, therefore the weight of the coefficients are equalized. And to further validate the insertion of the formula, Eric Gagne remained the 2003 winner upon applying post-regression numbers.

The invention left me with a sense of great satisfaction, of which I do not hesitate to admit. The arbitrary formula, pragmatically, is rather banal outside the spectrum of this particular regression. Its a meaningless scale that is not afforded qualities of general utility in evaluating players. WAR and WPA are sufficient for comparison across subsets of positions.

Here are regression coefficients and odds.

American League

      Source |       SS       df       MS              Number of obs =      66
-------------+------------------------------           F(  3,    63) =   47.04
       Model |  165575.187     3  55191.7289           Prob > F      =  0.0000
    Residual |  73919.2691    63  1173.32173           R-squared     =  0.6914
-------------+------------------------------           Adj R-squared =  0.6767
       Total |  239494.456    66  3628.70387           Root MSE      =  34.254

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         war |   7.365279   2.725402     2.70   0.009     1.918998    12.81156
         era |  -29.23094   5.821598    -5.02   0.000    -40.86448   -17.59741
         svw |      5.355   1.420173     3.77   0.000     2.517011     8.19299
------------------------------------------------------------------------------

Cy Young Odds:

NAME G WINS LOSSES K/9 BB/9 ERA SV PROB ODDS
David Price 32 21 7 8.4 3.8 3.03 0 21.37% 3.6/1
Jon Lester 33 18 10 9.2 3.1 3.12 0 17.75% 4.6/1
CC Sabathia 34 21 7 6.9 3 3.45 0 15.65% 5.3/1
Clay Buchholz 27 18 7 6.1 3.4 2.99 0 15.53% 5.4/1
Trevor Cahill 29 17 7 5.2 2.7 2.87 0 13.77% 6.2/1
Carl Pavano 33 21 10 5.2 1.6 3.95 0 13.27% 6.5/1
Mariano Rivera 60 4 3 7.7 1.5 1.98 33 11.97% 7.3/1
Jered Weaver 34 15 10 10 2.3 3.21 0 11.53% 7.6/1
Rafael Soriano 65 3 1 7.5 1.9 2.32 47 10.19% 8.8/1
John Danks 32 16 11 7 2.8 3.51 0 10.03% 8.9/1
Joakim Soria 67 0 3 9.7 2.4 2.05 45 9.29% 9.7/1
Cliff Lee 29 14 8 7.8 0.5 3.45 0 7.55% 12.2/1
Jonathan Papelbon 67 5 7 8.2 3.6 2.57 39 6.38% 14.6/1
Francisco Liriano 32 15 10 9.8 2.8 3.54 0 5.96% 15.7/1
Andrew Bailey 53 1 4 6.5 2.5 1.62 28 5.84% 16.1/1
Justin Verlander 33 18 10 8.4 3.3 3.81 0 4.80% 19.8/1
Jeff Niemann 30 14 4 6.5 2.8 3.29 0 4.52% 21.1/1
C.J. Wilson 33 15 7 7 4.2 3.68 0 4.18% 22.9/1
Felix Hernandez 35 11 14 8.2 2.5 2.99 0 3.92% 24.4/1
Brian Duensing 60 8 1 5 2.2 2.19 0 1.93% 50.9/1
Alexi Ogando 35 4 1 8.4 3.7 1.01 0 1.77% 55.6/1
Andy Pettitte 25 15 3 7 3 3.70 0 1.42% 69.4/1
Phil Hughes 30 19 7 7.4 2.5 4.01 0 1.39% 70.8/1

National League

Regression results:

      Source |       SS       df       MS              Number of obs =      64
-------------+------------------------------           F(  3,    61) =   49.35
       Model |  210353.074     3  70117.6914           Prob > F      =  0.0000
    Residual |  86668.9258    61  1420.80206           R-squared     =  0.7082
-------------+------------------------------           Adj R-squared =  0.6939
       Total |      297022    64  4640.96875           Root MSE      =  37.694

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         war |   9.735335   2.944883     3.31   0.002     3.846677    15.62399
         svw |   6.053746   1.591225     3.80   0.000     2.871894    9.235598
         era |   -37.1176   7.397198    -5.02   0.000    -51.90921   -22.32599
------------------------------------------------------------------------------

Cy Young Odds:

NAME G WINS LOSSES K/9 BB/9 ERA SV WAR PROB ODDS
Adam Wainwright 35 24 8 8.1 2.1 2.51 0 7.54 33.97% 1.9/1
Ubaldo Jimenez 33 24 4 8.4 3.6 3.00 0 7.20 27.68% 2.6/1
Roy Halladay 35 21 11 8.2 1 3.01 0 8.45 26.30% 2.8/1
Tim Hudson 33 19 7 4.9 3 3.03 0 7.55 21.14% 3.7/1
Josh Johnson 33 14 7 8.8 2.2 2.65 0 7.89 17.00% 4.8/1
Mat Latos 30 17 7 8.9 2.6 2.45 0 4.71 15.17% 5.5/1
Billy Wagner 73 8 3 12.5 2.6 2.19 40 2.47 14.45% 5.9/1
Heath Bell 71 7 0 11.5 3.7 2.55 48 2.35 11.94% 7.3/1
Johan Santana 34 14 8 6.4 2.7 3.04 0 5.90 7.60% 12.1/1
Chris Carpenter 36 18 6 7.1 2.5 3.52 0 4.75 6.99% 13.3/1
Brian Wilson 68 4 1 12.6 3.4 2.67 45 3.13 6.61% 14.1/1
Yovani Gallardo 31 15 7 9.9 3.6 3.13 0 4.08 3.90% 24.6/1
Jaime Garcia 31 14 7 7.1 3.6 2.78 0 3.21 3.47% 27.7/1
Jonny Venters 76 5 0 9.6 4 1.19 1 2.20 3.43% 28.1/1
John Axford 48 10 1 10.4 4 2.90 23 1.63 0.29% 347.4/1
Clayton Kershaw 33 14 10 9.4 3.8 3.21 0 3.81 0.06% 1764.1/1

At this point there is nothing to compare the odds to. They haven’t been released offshore or in Vegas. At length I’ll revisit with some thoughts on the matter.

No Comments