Extracting and Surveying Line Movement to measure a Model


Quickly, a nice an efficient way of extracting lines before I delve into explaining how I’m attempting to use line movement as a measure of success.

Excel macros are very easy to learn based on the process of recording and editing built-in or created macros.  If you understand the basic methods of looping in any given language — for excel macros the programming language being the manageable Visual Basic– you can have a lot of power at your fingertips.

Here is my code to extract some archived closed lines, indirectly from Pinnacle by way of SBRForum.  Some may already be fortunate enough and have the inherent foresight to realize the advantage of parsing the Pinnacle lines daily at close.  I’m just either too lazy or never thought about it, but there are ways around such neglectful and myopic behavior.

The date information can obviously be changed, might have to get creative in order to negotiate different months of data. There is probably an easy linear mathematical formula waiting to be found to convert the raw sequent date numbers to match SBRForum (perhaps use typical date format [yyyy/mm/dd] and strip the “/” replacing with “” within the code), but the format must fit the link structure of whatever website. My earlier extraction posts using statfox has different link formatting. Adapt to the site.  This worked wonders for me.


Now to measuring the success of a model.

Its hard to evaluate line movement.  What moves a line, how much faith to we have in market efficiency vs human rational behavior?  For any given game, one can make the assumption that there is 50% probability of a line moving with you or against you.  That would appear to be a rather simplified observation given the nature of line movement.  But in theory, its justified.

Let’s look at using line movement to assess the sophistication and accuracy of a model.  Taking my last four days of MLB wagers (governed by my MLB model), giving a sample size of 27, I’ve made a graph of my wagered line versus the Pinnacle close (I think this is practical because I typically bet the Pinnacle line or better due to my myriad sportsbook options).

In order to do this, the lines have to be converted to a more manageable number.  US line isn’t conducive to a comparative dataset.  Based on my preference, I converted each line to the respected implied win probability, outlined here, feel free to use my odds calculator.

First, here is the graph,  with the Pinnacle close in red superimposed over My line in green blue.

What this graph tells us is line movement is basically in stasis upon the submission of my wagers.  The blue being placed almost exactly on the same path as the red indicates there is not much one can draw from the graph other than the average probably approaches zero.  I should point out I am constrained to the sample size of the data.  Four days worth of bets vs 5 months of the season that has been observed is hardly an optimal enough aggregation for measurement.  But to lay the framework, using a smaller sample size to experiment allows for facilitating any possible scenarios that could be conditioned in the future using similar methods.

Keep in mind the concept of a 50/50 probability of the line moving with or against you.  Therefore its rational to assume the average difference between observed line movement and placed wager should approach zero.  The methods of a normal distribution data set are sufficient to produce reasonable data.

Percentage > 0%: 52%

This means 52% of the time through n sample size (in this case 27) the line movement, however slight, was in my favor, or simply the difference between close and my line was greater than zero.

For example, Team A line was -110 at the time I made the bet, and the Pinnacle closing line of Team A was -112, which would make a difference of -2 cents, or when converted to a percentage, ~ 0.4%.

Using that particular approach, applying to the entire sample, and assigning a boolean proposition (T | F = 1 | 0) to above or below zero, the percentage of time the line movement moved ‘with’ me, was 52%.

Now, the descriptive statistics of the four day sample are below, showing mean, standard deviation, and standard error estimation, labeled appropriately.

My Line Close Avg D
μ 50.89% 50.92% 0.03%
σ 7.07% 6.89% 1.13%
SEM 0.22%
CI95↑ 0.46%
CI95↓ -0.40%

Surveying the table, the conclusion can be reached that my betting habits function around a devotion to orthogonality. The average implied probability of the teams I wager on hover around even, with very little fluctuation in line movement from the time I bet to close. I should note that the time in which these wagers are placed is typically between 9:00 am and 10:00 am EST. I try to maintain consistency in that respect, to combat the inconsistencies of keeping a database using the lines posted when acted upon. One would expect there to be a mutual understanding of rationing line movement as it relates to information and time. To some degree that is why some sportsbooks choose to appropriate overnight lines. If this were not the case, books would leave themselves vulnerable to night vultures, terrorizing the market during the night with a sagacious eye for arbitraging. So relying on some sense of natural constraining mechanisms in the market would have to be in practice.

As I mentioned, the data has an asymptotic quality that assures a likely tendency to regress towards the mean as the sample size approaches infinity. This is in accordance with the line differential, not the actual line I wagered or the closing line. Because of the expected equi-probable scenario of line movement, up or down, the central limit theorem directs the data to a true normal distribution. Now if you noticed I used the term ‘likely tendency’ instead of morphologically constructing a guarantee. If one truly creates a +EV model, the products’ efficacy will be reflected in the line movement. Lest one is just extremely lucky.  This distinguishing luck from efficacy is the very essence of the model survey.

The standard error is the next aspect of digesting a tiny sample size. That is an estimation of the interval where the standard deviation is most likely to fall given the sample size. The point of emphasize being the Average Differential (Avg D). As one knows (Wikipedia), the standard deviation of a normal distribution is an indicator of where each data point falls in relation to the mean. In a normal distribution, a standard deviation allows for 2/3 of the data to fall within one standard deviation. One standard deviation here being 1.13%. A standard error of 0.22% may seem miniscule.  However, take in to account that the percentage probability can translate to money, in this case one would most likely say cents. 0.22% is about $1 for every $100 bet on two possible outcomes of one equi-probable event with very subtle line movement. So a 100 line versus a -101 line (roughly 0.22% probability) equals $1.  After making 500-1000 wagers, that $1 has a habit of accumulating.

The mean fitting within the 95% confidence level (CI, within +- 1.96 standard deviations) meets the conditions laid forth by our prior assumptions, that the difference of line movement on wagers placed convenes about zero.

The great thing about a conservative model is the risks being tethered to the orthogonal nature of the model.  I will in all probability rarely see a significant stream of wagers lead to a destruction of bank roll, or a massive reduction in growth potential.  However, to the more desired positions of pecuniary resolve opposite that of the aforementioned extremes, such approach will meet a similar fate.  That is a regression to some break even point.

  • Share/Bookmark

Related posts:

  1. Creating your own MLB line – Early Season Edition
  2. Playing the MLB 1 Run Line
  3. Extracting Data/Stats from StatFox for College Football
  4. MLB Average Line and Expected Wins
  5. Weighting Starting Pitcher Average Line
  1. No comments yet.

You must be logged in to post a comment.