Posts Tagged gambling
Apparently, time is on my side. Which is funny if you’ve followed my twitter fades the last century. (Obviously time is relative, what may seem like two months to you is almost certainly an eternity to tweeters of guaranteed fades, c’est moi, FML, c’est moi.)
It all started with this thought experiment. In a back room in a Las Vegas casino, you are handed a fair coin to flip. You will not be allowed to see the outcome, and the moment the coin lands you will fall into a deep sleep. If the coin lands heads up, the dealer will wake you 1 minute later; tails, in 1 hour. Upon waking, you will have no idea how long you have just slept.
The dealer smiles: would you like to bet on heads or tails? Knowing it’s a fair coin, you assume your odds are 50/50, so you choose tails. But the house has an advantage. The dealer knows you will almost certainly lose, because she is factoring in something you haven’t: that we live in a multiverse.
In any infinite multiverse, everything that can happen, will happen – an infinite number of times…How can we say that anything is more or less probable than anything else?
One procedure physicists are fond of is to draw a cut-off at some finite time, count up the number of events – say, heads and tails – that occur in the multiverse before the cut-off time, and use that as a representative sample.
It seems reasonable, but when tackling the casino experiment, something strange happens. Wherever the cut-off is drawn, it slices through some of the gamblers’ naps, making it appear as if those gamblers simply never woke up. The longer the nap, the more likely it is to be cut off, so if you do awaken, it’s more likely that you have taken a shorter nap – that is, that you flipped heads. So even though the odds seemed to be 50/50 when the coins were first flipped, heads becomes more probable than tails once you and the other gamblers wake up.
Somewhere deep down this is what J. L. Kelly, Jr. had in mind. Accidental prescience? I knew it.
Upon waking, you have new information: you know that time didn’t end. That now means it is more likely that you only slept for a minute than for an hour. After all, time could end at any minute, and an hour has an extra 59 of those to spare. Heads wins.
Ultimately, younger universes are more numerous than older universes, thus if I interpret each possible side as if its occurring primarily in younger universes, my probability increases at the rate proportional to y/u, where y is the number of younger universes, and u the number of older universes. I’ve figured it out, guys.
The AL is actually much easier to deal with because there is no “Barry Bonds” factor. Regardless, the formula has been changing daily, and after some thoughtful and sensible analysis I’ve arrived at the conclusion that voters are not consistent evaluators of MVP candidacy. There are relationships to be found between the distribution of voting points and the metrics that we use to gauge player performance, but that is only because there are only about five players each season that could even be considered. From there the selection of the ultimate winner is mostly driven by the motives of the people voting, and where their loyalties lie (See 2006 AL MVP). To elucidate this concept, I created a graph showing WAR for each winner and average WAR for the top 5 since 1990. Now obviously I wouldn’t expect a straight line from left to right, nor a steady increase. The concept being elucidated is not one to show fault of the voters, but of the unpredictability of how voters view an MVP winner. It appears to change from year to year.
At first glance, one might think this simply is a representation of fluctuating talent. The statistic itself adjusts for league wide scoring trends for each particular season, and with each team having access to the diverse international talent pool, the average bio-mechanical limits of players are at a league-wide equilibrium, and have been since the talent pool expanded decades ago. Other than the steroid jerk from around 1998-2004, player ability, as betrayed by the left side of the graph above, hasn’t increased nor decreased drastically in any given year. The year 2000 appears to be the only anomaly on the graph, steroids notwithstanding, and Pedro Martinez and his ridiculous 10.3 WAR (4th MVP) is enough to explain the spike. Stephen Jay Gould would be proud.
Statistics are becoming more and more sophisticated, and writers/bloggers are doing whatever they can to appear more sophisticated. Thus many of them have
embraced adopted WAR among other saber-stats. Because of this general propensity, I anticipated the lower WAR values for MVP winners to be from the 90s. To some degree this is true. Dennis Eckersley won the MVP in 1992, with a WAR of 3, outstanding for a relief pitcher (WAR is a counting statistic, so relievers have lower WARs by default). And Bill James will be happy to know swing-happy Juan Gonzalez has the lowest WAR for any MVP winner since 1990, at 2.8. But there is nothing else one can take from the graph other than randomness. Even the two highest WARs are from 1990 and 1991, Henderson and Ripken respectively. Obviously I didn’t expect with the creation of WAR comes an overall increase in player ability, which is just silly. I don’t know what I expected. Though it seems I should increase my sample size to span those years dating back to 1990, and probably further, rather than only using the last eleven seasons. Having said that, the current formula correctly selected eight of the last eleven MVP winners, so all the extra effort would probably be wasted energy. I’m only doing this to find value.
As I said in the previous post, I separated the MVP candidates into three groups: hitters, starting pitchers, and relief pitchers. This should be obvious enough, as the metrics used to define the best players in each category are drastically different.
I had been entertaining the idea of including WPA (Win probability added). Intuitively it makes sense that WPA is strongly linked to standard measures of offensive ability (AVG, HRS, RBI), as most events within a game occur when the run differential is within three runs. However, pitchers aren’t always in control of their statistical fate. At the same time WPA is taken directly from each individual event. Imagine a starting pitcher up 3-2 in the 7th inning with two outs leaving the game with runners on first and second. His replacement promptly surrenders a three-run HR. Two runs are charged to the starting pitcher, hurting his ERA, and he is now in line for the loss. At the same time, his WPA has not changed from another pitcher’s event. The last measurement taken for his WPA was whatever occurred with the batter before being replaced. Because of this, there is a conspicuous asymmetry in the relationship between raw statistics and WPA.
Obviously there are situations when hitters could see an increase in WPA while seeing a reduction in AVG, perhaps due to an error by the defense. But the impact is not as severe.
Team wins is another variable I had used, but are team wins indicative of voting trends or merely a by-product of the best players playing on the best teams? If I replace team wins with just a binary appropriation of playoff outlook (0 for no, 1 for yes), the table is more in agreement with intuition while possessing similar descriptive statistics. Take a quick gander at the last MVP update and you’ll understand why I replaced team wins with a yes/no playoff variable. Human thought can occasionally outwit statistics, as long as it suits one’s agenda.
Batter: Playoff, WAR, WPA, BA, HR, RBI, C
Pitcher: Playoff, WAR, WHIP, W%, C
The variables above represent a trend from 2000-2010, therefore some statistics, like ERA, do not translate to voting points to a certain degree. On a couple of occasions, a pitcher with a 4+ ERA received voting points, and the only reason WHIP is included is due to its lower overall variance. Nonetheless it works much better in this particular formula, and I can’t control what the voters decide. Again, I’m trying to find value based on historical data. Those who think Verlander is too low consider I only used data from 2000-2010, which didn’t see any pitcher win the MVP. If/When I include seasons dating back thirty years, Verlander’s odds may increase slightly.
Before I compared the three statistics (Line, WAR, WPA), I wanted to remove as many performance independent factors that go into a pitcher’s average line as I possibly could. There are some things that are just out of the pitcher’s control. A pitcher who started 10 games at home and 6 on the road will have about a 20% advantage in their vegas probability before anything else is taken into account. To adjust for home/road start discrepancy, I just multiplied the difference in home/road starts by .025, took the aggregate line, and divided by number of starts. Since HFA is set at 5%, each pitcher will have an increase or decrease of 2.5% in their line based on where they are pitching. I also had to adjust for opponents faced. This was fairly easy, the information is already in the SP report table, and an average pitcher has a vegas probability of .5. From there the calculation is elementary.
Obviously there are other things that go into line appropriation. One being public perception, which is hard to quantify. Linemakers have a panoply of information for which to draw from I would assume. I wouldn’t be surprised if there are some that keep a database of blink duration for each player, and any peaks or troughs in duration that a player may endure throughout the course of the year. Perhaps there is some relationship between change in blink duration and performance? Its a curious thing, simultaneous blinking. Five percent of our lives are spent walking around with our eyes closed. A sequential blinker may have an advantage in avoiding any impending danger projected, such as a spear or a rock. Why there are no sequential blinkers I don’t know. One would think sequential blinkers would reproduce differentially and would victor in pairwise contests with simultaneous blinkers. Or perhaps not? Maybe the sequential blinking mutation just never occurred. Its possible blinking sequentially is an impossibility, an incite deeply routed in the bilateral symmetry of vertebrates, or the eye protein of all organisms that are motile through a transparent spectrum.
A severe tangent, a devastating yet fascinating ramble. I can say whatever I want its my blog. I would actually be willing to do a research project on the correlation between blink duration and player ability, unfortunately nobody is stupid enough to commission such an important and groundbreaking research project, and I’m not going to do it for free.
The graphs below are actually pretty interesting, as the three statistics measure player ability from three different angles. I extracted the WAR and WPA stats from Fangraphs, using only qualified players to limit any variance and outliers. As expected, the three appear to be highly correlated with one another. WAR measures raw performance, WPA measures situational performance, and SP Line, though enigmatic, can be seen as a measure of public perception. Again, three different angles of assessing player ability. The R value is for all qualified players. Descriptive statistics at this point are limited by sample size but I don’t see any reason why with more data comes a lower proportion of variance that can be explained with the relationship. Especially with what statistics are being looked at here.
The graphs basically have the same topographical qualities, which is interesting because WPA explicitly handles quantifying specific events during the course of a game, and fundamentally does not resolve player ability unlike WAR. However, since most events during the game occur while the run differential is plus or minus three, a player’s statistics will in all likelihood indicate what kind of WPA is to be expected. There are exceptions, of course (cough cough Arod cough cough, it should be noted Arod’s best WPA season was in 2007, finishing first that year in WPA and winning his third AL MVP award, further validating my inclusion of WPA into the MVP odds formula).
The essential uselessness of the information found on this site is overshadowed by the sense of achievement felt upon completion of the research, for me anyways. Hopefully my growing cohort of followers have actually discovered some mode of utility from my efforts.
Last post, which was about three weeks ago, I tried to isolate the pitching metrics (FIP, WHIP, ERA, etc…) that held a higher relationship with linemaker tendencies. I basically had all the data on hand, but organizing and doing all the calculations turned out to be a painstaking process, more so than I expected. The primary obstacle was overcoming the varying naming conventions used by different web sites. Cross-site veracity is lacking, and manually purveying the names that induce error messages is not fun at all. Nevertheless, I finished.
I highly suggest you read the last post if you haven’t already. Briefly what I did was found the expect run line for each and every pitcher given their average line and average total, compared that to their run differential using 5 different pitching metrics, adjusted for bullpen performance, and found the correlation between each pitching metric and the expected run line, using every starting pitcher from 2005-2010. A Correlation is presented as a unit-less number between -1 and 1, the closer the number is to -1 or 1, the higher the relationship. In this case, it measures the proportion of variance in the Line that can be explained by either FIP, xFIP, ERA, WHIP, or xERA. The results are below:
All these pitching stats measure some form of runs surrendered per 9 innings using raw data. So we wouldn’t expect any great derivations from an overall average. However, what is interesting, is how each statistic corresponds to situational contingency, aka luck. WHIP is a measure of walks and hits per innings pitched, though how those baserunners translate into runs is in large part due to the ability of the defense and additionally dependent on the ability of the players running the bases. Often a pitcher has a low WHIP and in proportion a high ERA, or a high WHIP and a low ERA. The whipERA statistic projects all WHIP numbers to a number that resembles a runs allowed per 9 innings statistic, simply by multiplying by PI. For those who are inclined to create their own models, this is useful. And given the correlations from above, is a method that holds some validity.
FIP retrofits the weights given to defensive independent pitching statistics (HRs, Ks, BBs), and xFIP goes even further in considering situational contingency as a variable by normalizing all FIP numbers to the league average HR/FB ratio. Certainly what this may tell us is there are further nuances of DIPS that may merit further consideration, or that the linesmakers have thus rendered as being extremely significant, more so than standard measures of runs scored and runs allowed.
Moneylines can be analytically reduced to numbers primed for comparison to performance. In baseball, the line is presented as a standard runline, 1.5, but using a modified Pythagorean Expectation one can extract a more precise line from the moneyline and total. I’ve rendered this formula as being an inverse to the standard Pythagorean Expectation created by Bill James.
We’ll use Roy Halladay as an example, and this calculator I created to find the precise run line. Using YTD vegas numbers, average moneyline and average total, the result shows an expected run differential per game conveniently expressed like a set run line.
Average Line (Win Prob): 63.1%
Average Total: 7.04
Expected Run Line: -1.07
The average total required a mathematical trick to incorporate the variable vigorish (i.e. -8 -110 does not equal -8 101). I just arbitrarily set each dollar equal to 5% of a run. This would mean over -8 -120 is equivalent to -8.5 -110. Somewhat crude but I feel it more accurate than completely ignoring the price on the total. In excel the input would be, assuming total is in cell A1 and price is in A2:
This formula is applied to every total for each pitcher and then the average is calculated.
A negative Run Line means an expected positive run differential, because it is conveniently expressed like you would see at a sportsbook. To avoid confusion, think of the value as being reversed, positive becomes negative and negative becomes positive. Yeah, confusion avoided.
These are Halladay’s YTD performance numbers:
The Phillies average 4.2 R/G, so the corresponding run differentials are:
All but ERA/WHIPx (WHIP * π) are between .4 and .8 runs per game higher than the vegas expectation. Let’s include bullpen. This post explains how to integrate the bullpen variable, its simple algebra. To simplify I’ll just assume Halladay averages 7 IP/GS. The resulting run differentials:
A CG is rare, this is no secret and Vegas understands this. Relative to the rest of the league, Halladay’s IP/GS is always among the top 5 in the league, yet the effect of the bullpen is extremely significant. When Halladay starts, every .1 run is roughly equivalent to a 1 percent change in win percentage.
Why do all this? Well a moneyline, or any set odds released to the market, is multidimensional. For baseball, included in the opener is all the elements that go into creating a line; pitching, offense, defense, opponent, injuries, weather, lineup, umpire, intangibles, and public perception. After that the market is more or less random and left to the mechanisms of market heuristics. So which best demonstrates how a sportsbook may model a team’s expected performance?
Finding the correlation with what the sportsbooks release using the five pitching variables variables above might help to demonstrate how a moneyline is created for the entire league. Below is how each pitching metric correlates to the run line developed from the average total and average line for each starting pitcher.
R, Expected Run Line
These numbers show the proportion of variance that could resolve how a line is set. Correlations are a unit-less number ranging from -1 to 1, it might help to just think of a correlation as a percentage. Obviously the veracity of these values are compromised by lack of sample size. I have line data dating back to 1997, so at length I’ll post a correlation table.
I haven’t made any considerations of alternative measures of offense, or made adjustments by normalizing pitcher data to league averages. The starting pitcher report here has a column for opponent starting pitcher’s average line and team average line, so some sort of adjustment shouldn’t be too hard.