MVP and Cy Young Odds Evaluation Process
Posted by SportsObjective in Featurific, MLB, Visual Basic on July 28, 2010
I decided to take a look at the MVP and Cy Young odds. So in doing so, I felt it appropriate to utilize the excel macro framework uncovered in my college football and subsequently line movement survey posts. Both required some form of data extraction process, using different sites but implementing similar programming methodology.
The imminent site for baseball awards and statistics is, as one knows, baseball-reference.
I haven’t analyzed the numbers yet, but thought it beneficial to share the extraction code for those inclined to undergo some form of related odds evaluation process.
Here is the excel workbook with 2000-2009 data already in place with sheets labeled according to the date. Ten years of data seems optimal enough for me. From my standpoint, further experiments in finding fair-value odds on these MVP and Cy Young futures would require including the future odds themselves.
At length I’ll post the results.
Extracting and Surveying Line Movement to measure a Model
Posted by SportsObjective in Betting, Featurific, MLB on July 27, 2010
Quickly, a nice an efficient way of extracting lines before I delve into explaining how I’m attempting to use line movement as a measure of success.
Excel macros are very easy to learn based on the process of recording and editing built-in or created macros. If you understand the basic methods of looping in any given language — for excel macros the programming language being the manageable Visual Basic– you can have a lot of power at your fingertips.
Here is my code to extract some archived closed lines, indirectly from Pinnacle by way of SBRForum. Some may already be fortunate enough and have the inherent foresight to realize the advantage of parsing the Pinnacle lines daily at close. I’m just either too lazy or never thought about it, but there are ways around such neglectful and myopic behavior.
The date information can obviously be changed, might have to get creative in order to negotiate different months of data. There is probably an easy linear mathematical formula waiting to be found to convert the raw sequent date numbers to match SBRForum (perhaps use typical date format [yyyy/mm/dd] and strip the “/” replacing with “” within the code), but the format must fit the link structure of whatever website. My earlier extraction posts using statfox has different link formatting. Adapt to the site. This worked wonders for me.
Now to measuring the success of a model.
Its hard to evaluate line movement. What moves a line, how much faith to we have in market efficiency vs human rational behavior? For any given game, one can make the assumption that there is 50% probability of a line moving with you or against you. That would appear to be a rather simplified observation given the nature of line movement. But in theory, its justified.
Let’s look at using line movement to assess the sophistication and accuracy of a model. Taking my last four days of MLB wagers (governed by my MLB model), giving a sample size of 27, I’ve made a graph of my wagered line versus the Pinnacle close (I think this is practical because I typically bet the Pinnacle line or better due to my myriad sportsbook options).
In order to do this, the lines have to be converted to a more manageable number. US line isn’t conducive to a comparative dataset. Based on my preference, I converted each line to the respected implied win probability, outlined here, feel free to use my odds calculator.
First, here is the graph, with the Pinnacle close in red superimposed over My line in green blue.
What this graph tells us is line movement is basically in stasis upon the submission of my wagers. The blue being placed almost exactly on the same path as the red indicates there is not much one can draw from the graph other than the average probably approaches zero. I should point out I am constrained to the sample size of the data. Four days worth of bets vs 5 months of the season that has been observed is hardly an optimal enough aggregation for measurement. But to lay the framework, using a smaller sample size to experiment allows for facilitating any possible scenarios that could be conditioned in the future using similar methods.
Keep in mind the concept of a 50/50 probability of the line moving with or against you. Therefore its rational to assume the average difference between observed line movement and placed wager should approach zero. The methods of a normal distribution data set are sufficient to produce reasonable data.
Percentage > 0%: 52%
This means 52% of the time through n sample size (in this case 27) the line movement, however slight, was in my favor, or simply the difference between close and my line was greater than zero.
For example, Team A line was -110 at the time I made the bet, and the Pinnacle closing line of Team A was -112, which would make a difference of -2 cents, or when converted to a percentage, ~ 0.4%.
Using that particular approach, applying to the entire sample, and assigning a boolean proposition (T | F = 1 | 0) to above or below zero, the percentage of time the line movement moved ‘with’ me, was 52%.
Now, the descriptive statistics of the four day sample are below, showing mean, standard deviation, and standard error estimation, labeled appropriately.
| My Line | Close | Avg D | |
| μ | 50.89% | 50.92% | 0.03% |
| σ | 7.07% | 6.89% | 1.13% |
| SEM | 0.22% | ||
| CI95↑ | 0.46% | ||
| CI95↓ | -0.40% |
Surveying the table, the conclusion can be reached that my betting habits function around a devotion to orthogonality. The average implied probability of the teams I wager on hover around even, with very little fluctuation in line movement from the time I bet to close. I should note that the time in which these wagers are placed is typically between 9:00 am and 10:00 am EST. I try to maintain consistency in that respect, to combat the inconsistencies of keeping a database using the lines posted when acted upon. One would expect there to be a mutual understanding of rationing line movement as it relates to information and time. To some degree that is why some sportsbooks choose to appropriate overnight lines. If this were not the case, books would leave themselves vulnerable to night vultures, terrorizing the market during the night with a sagacious eye for arbitraging. So relying on some sense of natural constraining mechanisms in the market would have to be in practice.
As I mentioned, the data has an asymptotic quality that assures a likely tendency to regress towards the mean as the sample size approaches infinity. This is in accordance with the line differential, not the actual line I wagered or the closing line. Because of the expected equi-probable scenario of line movement, up or down, the central limit theorem directs the data to a true normal distribution. Now if you noticed I used the term ‘likely tendency’ instead of morphologically constructing a guarantee. If one truly creates a +EV model, the products’ efficacy will be reflected in the line movement. Lest one is just extremely lucky. This distinguishing luck from efficacy is the very essence of the model survey.
The standard error is the next aspect of digesting a tiny sample size. That is an estimation of the interval where the standard deviation is most likely to fall given the sample size. The point of emphasize being the Average Differential (Avg D). As one knows (Wikipedia), the standard deviation of a normal distribution is an indicator of where each data point falls in relation to the mean. In a normal distribution, a standard deviation allows for 2/3 of the data to fall within one standard deviation. One standard deviation here being 1.13%. A standard error of 0.22% may seem miniscule. However, take in to account that the percentage probability can translate to money, in this case one would most likely say cents. 0.22% is about $1 for every $100 bet on two possible outcomes of one equi-probable event with very subtle line movement. So a 100 line versus a -101 line (roughly 0.22% probability) equals $1. After making 500-1000 wagers, that $1 has a habit of accumulating.
The mean fitting within the 95% confidence level (CI, within +- 1.96 standard deviations) meets the conditions laid forth by our prior assumptions, that the difference of line movement on wagers placed convenes about zero.
The great thing about a conservative model is the risks being tethered to the orthogonal nature of the model. I will in all probability rarely see a significant stream of wagers lead to a destruction of bank roll, or a massive reduction in growth potential. However, to the more desired positions of pecuniary resolve opposite that of the aforementioned extremes, such approach will meet a similar fate. That is a regression to some break even point.
From SBRForum: Bayesian Probability Estimation
Posted by SportsObjective in Betting, Featurific on July 24, 2010
Market Efficiency and Bayesian Probability Estimation via the Beta Distribution
[Ganchrow]…
What is Bayesian inference you might ask?
Well it’s really a different way of looking at probability that allows for a different methodology in forecasting. Recall that I earlier wrote that a frequentist would view the expected value of a bet as the average profit per game were the bet to be repeated an infinite number of times. Well that’s just not how a Bayesian sees the world.
A Bayesian considers the probability of an outcome as a (possibly subjective) measure of the degree of informed belief in that outcome. We talk about “informed” belief because the process of Bayesian inference involves updating one’s prior beliefs based on the availability of new evidence.
A Bayesian doesn’t in general think about hypothetical frequencies of an event given a hypothetical infinite number of repetitions because in general events can’t be repeated an infinite number. To estimate outcome probability what a Bayesian does is gauge prior knowledge of that event and then update that knowledge as future information becomes available.
Note that this methodology doesn’t hold value for all types of experiments as for some events can know everything there is to know about it a priori. These events (take the game of Craps for instance) are frequentist in nature meaning that there is no value to new information (although Craps could deemed be otherwise were we to suspect either an unfair game or the presence of a skilled dice roller).
This can be a particular convenient tool when looking at the progression of a betting line. One can build a forecast in any way one chooses and then continually reevaluate that forecast based on upon the availability of new evidence (e.g., a change in the market line).
By way of contrast, a non-Bayesian might place a bet at a line of +3, and then after observing the line move to +5 simply declare his original bet “bad” in that the market hadn’t backed up his opinion. A Bayesian, on the other hand would realize that his bet was made conditioned only on the information he had available at the time the bet was made (which would have only included the then current line), and while he would almost certainly view the bet as “unfortunate”, he could accept that the bet was still a good bet at the time it was placed. Certainly he’d use the new information to revise his current opinions on game probabilities, and going a step further might even use it to discount the value of his model, but a Bayesian can accept that a decision might be perfectly valid at the time it was made, even as new information sheds doubt on it in hindsight.
Having only a vague understanding of the concepts, I feel less than qualified to offer commentary. But I will say, assuming a state of rationality, with new content comes a re-evaluation of the odds, probably correlating to the content. Notwithstanding some books may feel inclined to manipulate the market to deceive their clients.
Either way, I think it counter-intuitive to some sports, and essential to others. Certainly for baseball, its difficult to imagine one sitting in front of the computer all day and reconfiguring probability in accordance with new information. Games are everyday, and sometimes teams, and gamblers, have 12 hours between the end of one game and the beginning of the next. There are obviously many different methods of baseball handicapping, but time doesn’t necessarily breed enhancement of probability measures. Not every line in baseball is driven by injuries or new information such as the weather, lineups, listed pitchers. The line is often set and undergoes consequential line movement in light of no information (other than money) as all other factors remain constant. This doesn’t mean that the linesmakers assessment of the lines is as precise as it was prior, if even it was precise at all.
This is true to an extent for every league in every sport. Yet for football Bayesian Estimation of Probability is a more appropriate institution because of the amount of days between games, and the amount of variables that can be affected or new ones that may appear during that week.
Having said that, again we are under the assumption humans are rationale. Its debatable if we are endowed with the ability to truly understand how to re-assess and modify the odds based on new information. What kind of information should affect probability, and in what direction? The Bayesian philosophy is overly optimistic, and makes a rather arrogant assumption of the ability of our brains to isolate information from drivel.
More Calculator Programming
Posted by SportsObjective in Betting, Featurific, moneyline on July 23, 2010
This time in Javascript. Javascript is easier to absorb since its merely a slight derivative of its apposite Visual Basic (think excel macro language). And the facility of javascript with web programming is more manageable and less obtrusive than PHP.
Here is a standalone Pythagorean Winning Percentage Calculator, and another Odds Calculator
If you want the source code to use as an offline application, send me an email.
Windows Odds Calculator
Posted by SportsObjective in Featurific on July 19, 2010
Someone requested I make a stand-alone windows version of the PHP Odds Calculator. So here it is. It comes with a US/Decimal switch.
Its pretty self-explanatory, a more detailed explanation is posted at the PHP Odds post linked above.
One advantage about building a database is now I am afforded the luxury of implementing the time element in a variety of different ways, time that would have otherwise been conceited to handicapping. Since it only takes me about 30 seconds a day to decide for which teams I should appropriate bankroll, an immense amount of time is left for me to research and experiment.
Perhaps I’ll explore the creation of IPHONE Apps. I’m sure its Unix based, which I already have a healthy background. I’ll look into putting together and then releasing the Odds Calculator for IPHONE for $24.99.



Recent Comments