Posts Tagged sportsbook
Sportsbooks haven’t convened MVP odds yet because I haven’t posted them myself. This is an obvious observation to anybody that visits this blog on a yearly basis. I think we’d all agree on this. (I use the terms “we’d all” and “nobody in particular” interchangeably).
The formula behind setting a probability on a given player’s chances can be expressed as:
If a player doesn’t register a positive number of MVP points, the variable v, then he is simply ignored. The points are calculated slightly differently in the NL and AL, and the years 2000-2010 were used to fit the data. This has already been explained on multiple occasions.
For AL batters and pitchers:
The “PLAYOFFS” variable is either 1 or 0, and in season playoff projections are essentially current standings.
For all NL batters and pitchers:
The motivation for using WAR and WPA as primary coefficients stemmed from this post, which I found quite interesting.
At the bottom of the post I’ve attached some relevant excel files. I’m not going to post anymore about this (I’ll do Cy Young this weekend and attach the necessary files), there really shouldn’t be any reason for me to have to. I also never want to have to use or look at an excel file ever again. But if I get enough requests via twitter/email/comments I’ll make a dedicated page that updates daily, probably using my own WAR calculations instead of bRef’s mess of drivel, and some server-side scripting.
Last year the formula picked Ryan Braun and Miguel Cabrera. Verlander I think can we all agree should not have won the MVP.
Here are the files. The “NLMVP_ODDS” and “ALMVP_ODDS” files require a data refresh and some sorting. Feel free to change the coefficients, I don’t care. Some files may be irrelevant, not sure. I just threw a bunch of seemingly related files in an archive.
*Re-post from last year. Haven’t made any changes to the formula. The example used represents a scheduled game from last season.
I’ve been saying to myself and others, that the exerted effort necessary to complete a worthy and sufficient formula for integrating the variable of the bullpen into creating an MLB game moneyline is not proportional to the degree of change in moneyline. For some reason I was under the impression it wouldn’t make that much of a difference, for most bullpens hover around a 4 ERA (3.5 to 5 at the most), and the ratio of innings pitched to that of a starter in any given game for both teams is such that any projected runs allowed would not see a consequential increased when subjected to the mark of a bullpen.
However, I now realize that I was absurdly wrong on so many levels. After having finally found the proper formula concoction to conduct a bullpen variance, the bullpen adjusted line and the original created line (calculations explained here with changes discussed here), often sees a 40 cent line differential. So let me explain as best I can how I was able to add a satisfactory number to project the impact a team’s bullpen may have on any one particular game.
The created line as it previously stood, was a 162 game projection using ZiPS, CHONE, or actual data, of the starting pitchers’ projected runs allowed of 9IP per game, and a team’s projected runs scored over the season. This gives a nice standard number of runs scored vs runs allowed, used to measure the Pythagorean winning percentage.
How would I add starting pitcher determinant bullpen factor? What I thought best was to take the projected innings pitched by the listed starting pitcher, divided by the number of games projected to start (projections are for now, CHONE, and ZiPS in season projections that is updated regularly). Then use that number as a percentage of 9. Multiply the resulting percentage by the number of runs projected to allow over 162 games, and you get a total number of runs surrendered by that starting pitcher over an entire regular season. Now you still have a certain percentage left over to use as the normalizer of the two dimensional bullpen projection (bullpen ERA * 162) using the YTD statistics as ERA. Multiply this two dimensional projection of the bullpen by what percentage is left over from the starter’s predicted innings pitched per start, and you have bullpen runs allowed over an entire regular season if that particular starting pitcher pitched every game. Add the two numbers, and this should equate to a solid indicator of how a bullpen might regulate the expected team’s pitching performance.
Here is an example of a calculation using Randy Wolf (listed 4/22 vs Nationals):
Projected Runs allowed = Runs / IP * 1460 (allows for randomness)
Runs allowed = (87 / 183.3) * 1460 = 692.96
Adding Bullpen Variable:
IP / GS = IP per game
IP per game = 183.3 / 31 = 5.91
IP per game / 9 = Percentage of IP per game
Percentage of IP per game = 5.91 / 9 = 66%
Projected runs per 162 games = 66% * 692.96 = 457.36
1-Percentage of IP per game * 9= Projected Bullpen IP per Randy Wolf start
1-66% * 9 = 34% * 9 = 3.09 Bullpen IP/g
ERA * 162 = Runs allowed (Flat projection over 162 game season)
Bullpen runs allowed = 5.85 * 162 = 947.7
Runs allowed * Project Bullpen IP per Randy Wolf start
Runs projected per 162 Randy Wolf games started = 947.7 * 33% = 325.07
Randy Wolf projected runs per 162 games + Bullpen Runs projected over 162 Randy Wolf games started
Total expected runs allowed per 162 games = 457.36 + 325.07 = 782.43
782.43 runs allowed is the bullpen adjusted runs per 162 Randy Wolf games started
Let’s compare the numbers, by use of words, and look for proportional reciprocality to ensure a consistent method of calculation.
Randy Wolf’s ERA being 4.27 is considerably lower than Milwaukee’s current bullpen performance, which is a dreadful 5.85 runs per game. The two being separated by about 1.6 runs per game, would leave a reasonable person to assume that Milwaukee’s bullpen will have a negative effect on games started by Randy Wolf. And the resulting numbers show as much.
We projected Randy Wolf to pitch roughly 5.91IP per start. Using this number, his adjusted runs allowed (692.96) is now appropriated to his IP per start, and the result is 457.36. With Milwaukee’s bullpen ERA being considerably higher than Randy Wolf’s ERA, we would now expect the bullpen variant to have a negative impact on runs scored, meaning a consequential increase. Milwaukee’s bullpen has a two dimensional projection value of 947.7 runs allowed per 162 game. And adjusting to Randy Wolf being the listed starter and the number of total runs allowed (782.43) should be higher than the starting pitcher exclusive runs allowed (692.96).
All the variables and determinants appear to line up with rational expectation.
Now using the framework in place, a team with a solid bullpen with an ERA better than that of the listed starting pitcher’s ERA, would correlate to a higher advantage in projected runs allowed if that respective starter is expected to pitch lesser and lesser innings by degree. One problem with that theory, of course, bullpen ERA is not uniformly distributed from reliever to reliever. The more the bullpen is expended, and the earlier it is called on for relief, the more the level of performance regresses to mediocrity.
For now this is the best method I can come up with. If you want the updated excel MLB linemaker, now with a column for starting pitcher predicted line, and bullpen adjusted line, comment here, email or twitter me.
Bills have first right to claim him on waivers. Might be something to keep in mind if inclined to make a remunerative guess. If he goes to the Chargers I’ll eat my hat.
This is how I’ve decided to approach any and all measures of performance, statistics, and various other factors. I use the line that Vegas convenes and triangulate all my data to the line. I did it with Basketball, Baseball, and now the next step is football. Here I’m focusing on College Football. Not sure I’ll even attempt to model an NFL database, the sport is a different animal. For the NFL, I’ll just stick with intuition and finding good fades on the internetz. And the fades emanate with resplendent fervor if you search the forum spectrum for CAPS LOCKS AND !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Back to College Football. First I want to find how substantial is the line itself. Meaning how does a team’s overall line compare to team wins. Since for gamblers, the spread is the only thing that matters, this may seem meaningless. But in order to appropriate a spread with precision and a marked sense of sharpness, how well a team performs over the season is highly correlated to wins and losses. Which subsequently can be expressed, with a measure of consequence, via margin of victory, using the proverbial Pythagorean Win Percentage calculation.
Having extracted data from Statfox using the methods described here and here (I can assure you, regardless of the tendency for my computer to show a personified resentment by way of heating up, because of all the work it has been given, this is still a very economical method of extraction), the next step is to serry the data in a fashion that is conducive to evaluation and analysis. And for me this involves calculating the average line for each team, then translating the line into wins using a least squares line. The relationship between average line and wins should obviously be highly correlated, and with a low margin of error, the wins formulated via where a line falls on the highly linear trend can be used as a starting point.
With the actual wins, and the linear wins from average line, there are a couple more ways to find expected season wins. A direct Pythagorean expectation, points scoredx / (points scoredx + points allowedx), and the Pythagorean formula replacing the observed points scored with the average line. (I did this with college basketball, and explained the calculation to capacity here, its pretty straight forward and obvious, average the team points scored and points allowed, and subtract or add the half the average line margin, if average line is below zero, add half the line to points scored, etc…).
The one thing with Pythagorean expectation that is imminently taken under consideration is the value 0f the exponent. Some prefer a constant using observed past data, here I used the exponent that corresponded to the lowest absolute average difference between observed and expected, moving from season to season. Different seasons induced various exponent, higher scoring environments generally relate to higher exponents, and vice versa.
In order to isolate the most ideal exponent for each season, the lowest absolute average difference (often referred to as the mean absolute difference, or MAD), is used rather than the actual difference because what I am concerned with is the error difference from actual percentage, rather than a true difference. I want everything to compare to zero. So a difference of -.05 and .05 won’t average to zero, instead both will have an absolute difference of .05. (Again I laid all the framework for this out in my college basketball dirges). There are other ways to find the best exponent, but MAD, despite its inferences, is a sound mathematical tool.
All the data from 2002-2009 were calculated with this exact framework in mind. One more thing I did in order to rate the teams from the last eight years. This will be just a slight digression, but one that I think is very meaningful and explanatory. Instead of only finding he average line, I decided to do an adjustment to opponent’s average line. I extracted every opponent’s line for each team (more VBA code), found the average line for that opponent, and subtracted the difference. Its overly simple, and a more precise formulation is probably waiting to be found, but regardless, once sorted by adjusted line, the order of the teams took on a different look, similar to what I did with Starting Pitcher Line Weight. For example, in 2009 instead of Nevada being in the top 25 in average line, once adjusted their ranking moved down to 40 or so. Same with Boise State or TCU, the teams in the second tier conferences move down a few slots after the adjustment, which is what I set out to do. The formula is direct and simple, and the result is substantial enough to warrant an inclusion of the formula into a sort of ranking system.
I’ll just show the table for 2009, but here is the average line and adjusted line top 10:
AVERAGE LINE 2009
ADJUSTED LINE 2009
(For you Hokie fans out there, VaTech was rated 11th in both with and without adjustment)
The lightish purplish pinkish (or how about just pink) cells show the descriptive statistics. Its amazing that I added an average line cell. Its obviously zero, right? Nope. -0.37 can be seen as an error value for statfox I guess.
Linemakers are the least appreciated operation in terms of level of sophistication and intelligence on the market. Another thing should be mentioned is variance of the opponents line. For the most part, teams schedule neutral teams, or the schedule is such where there is an equillibrium between the degree of difficulty and the degree of cupcake. Maybe a team like Troy will schedule Georgia and Florida, and be underdogs of 20 to 30, but once they get into Sun Belt conference play their average conference line may be around 7-10 point favorites. The weight of conference games is merely showing its effect here. Because distribution of league wins (and the average line) are for the most part Gaussian (a tendency to cluster around a 50% winning percentage, or a line of zero), I can make the assumption that 2/3 of the teams schedule opponents that range from a line of 2.27 to -2.27, which is an indication of how well balanced conference play is in my opinion.
With the above data merely as a nice ranking system, and perhaps a starting point for future team metric formulations, here is all the actual vs expected win measures from 2002 top 2009 (which I mentioned before the digression, BTW I used the actual average line to find expected wins, rather than adjusted, for arbitrary reasons) sorted by adjusted line, only showing the top ten. The exponent used for each season is in the yellow header beside the year. Its pointless to wait for the 2010 season to end to find the most optimal exponent, since doing so would imply the season has already ended and team evaluations finished without exploiting the data for the sake of gambling on the teams. Judging by the table, I think an exponent of around 2.24 should suffice in Pythagorean calculations to assess team by team scenarios for the 2010 season. If you want all the data email me and we will have to work something out. Perhaps a data swap or sign up for one of the affiliates and I’ll send you some of my excel sheets. I need list of returning starters (already have 2008 and 2009) and preseason sportsbook future win totals back to 2002 (have 2009 for 47 teams).
*The Pink Cells are average for all teams for the respective seasons
*Texas and USC just beat the shit out of everybody in 2005
*Games scheduled against teams of which no line was placed on the game were excluded from win evaluations. Therefore a Team A with a higher line than Team B can still have less pythagorean line wins and linear line wins.
A brief survey of all the data demonstrates that team wins predicted by least squares fit is, for the sport as a whole, the best way to compare to actual wins, or at least the more accurate method. This is clear purely by looking at the average wins. The Pythagorean formulas seem to invariably overshoot expected wins throughout all teams on average for each season. I’ve always thought the wins founded by the linear relationship between line and wins is one of the best ways to measure how over or underrated a team is. Like I said before, this sort of model worked very well for college basketball and the NBA, even though one year success rate could be completely random fluctuation of variance, and has performed adequately enough in MLB. (Remember I was down 15x before implementing the all-encompassing starting pitcher behemoth of which I only allow a handful of people to use)
For Vegas (or nowadays Pinnacle) has a collection of the best and most sophisticated equalizers for team performance that even seems to transcend how the team performs and what the team thinks of themselves. They’ve manifested an entirely new way to judge how good teams are, though their results are largely ignored by the MSM because of the bad stigma surrounding sports gambling. Vegas knows best, always.
Tomorrow Eventually, I’ll post how the different statistical variables impact, or correlate to, the average line (yards per play, yards per point, rushing yards per attempt differential?), and perhaps find a nice and easy formula using coefficients as weights to find an expected line. Then I can start running year to year regression, how each variable extrapolates to the following season’s average line, etc…
This could (will?) allow me to disclose an ATS W/L record that might translate to an optimal level of expectation vs the observed record (again I did something similar with college basketball in trying to predict ATS records, though not nearly as involved).
If you are like me, which is not a strong possibility since you are not reading this, then searching around for the best number has elevated itself behind merely an ancillary hobby, and is the eminent thief of your precious time. So now that we have entered into the abyss of the industry (abyss being realized once you start betting WNBA 3rd Quarter team totals), the most intriguing aspect is discovering new books, that are, or at least we hope that they are, incompetent. Not to say Digibet is one of these books, they may very well be truly adept at analyzing what bettors want to do and found a way to position themselves into a market that they can take advantage.
Anyway, Digibet is essentially a slow moving WNBA book. Their most welcoming quality is their propensity for fixed beaver ball, sorry for the inappropriate language though it is within the expectation of my target demographic, WNBA lines.
One issue: Their lines being fixed means static odds. The vig is -118, and stays -118 for the duration of the moving market up until game time. But here is just one example, and there are many, of Digibet’s WNBA lethargy:
Phoenix @ Seattle – June 6
Now the question is, how much is a WNBA line worth. Standard operating procedure suggests each half point is 10 cents. Here we have 1.5 pts for 8 cents, for a difference of 22 cents. So already by betting the line you beat the market 22%. But what is the weight placed on getting the best number in WNBA vs the juice expenditure? Because the lines probably aren’t as tight in the WNBA as other sports, missing a number by 1.5-2 pts, while not the ideal scenario, may not prove as costly as any other spread sport. Not sure it depends, its a question that calls for comprehensive statistical analysis, by using expected growth models under each condition. Which would be fun, but I’m probably not going to do it.
Back to Digibet for a second, they claim to not only take credit card deposits, but withdraw requests go right back into the credit card, and this is for US customers.
The layout of the website has its navigational complications, but once you decipher the design, its pretty basic.