Archive for category MLB

Updated NL/AL MVP Predictor

Some slight changes in MVP position and predicted number points. The odds still have yet to be released offshore. Pujols has hit a devastating surge, he could win the triple crown. Still think Adrian Gonzalez has the best value in either league.

Keep in mind in both leagues, the players listed are all the players that registered in the predictor equation. Once some players are filtered from the equation the odds will drop. This is because my formula to create odds is simply the total number of predicted points divided by two. Logically, if you have over half the number of points one would be declared the winner

, , , , , , , , , , , , , , , , , ,

No Comments

NL and AL Cy Young


The requisites for establishing a reasonable prediction of the likely Cy Young candidates were similar to what I did with the MVP. Of course different statistics needed to be applied, but the essence of the framework remained the same. Find numbers of consequence and regress.

But it doesn’t take long for one to notice, after an inventory of the candidates of years prior, that the appropriation of voting points to starters as well as closers permits the process to be negotiated via arbitrary re-configuration of statistics. This means, since closers are measured by accumulating saves, an equalizer of saves and wins serves the purpose of adjoining closers and starters into the same regression. Relievers that are not closers are S-O-L. I can assure you, with a fair amount of certainty, a middle reliever will not win the Cy Young.

The aforementioned requires some degree of mathematical ingenuity. The term arbitrary finds itself useful in describing such a process of ingenuity, in addition to being thrown around in describing similar applications. I find it to be a very flexible word of choice. I don’t think it presumptuous to denote the term as having, uniquely, a universal privilege for relating the vicissitudes of the day. When you walk to some particular location, the steps you take are not so much planned as they are the result of having authority over the steps you take for the convenience of arriving at the desired objective. Therefore the route can be considered arbitrary, subjected to the judgment of the walk-man.

Arbitrary in mathematics refers to a constant with an undetermined value. Therefore a formula can determine the value, and act as the constant itself.

During the MVP regression process, I felt it practical to provide a certain weight to players playing on teams that make the playoffs (since a playoff appearance by a player has a positive correlation to voting points), by adding the square root of wins to itself, producing the formula:

Given team wins (TW), solving for weighted team wins (TWx)

TWx = sqrt{TW}+TW

The results served the purpose of adequacy, and I didn’t feel it necessary to experiment further. Obviously the weight will be incommensurate with the reality of the situation, though not beyond the risk of ineffably skewing the odds. I did not much deliberate on the matter simply because the resulting odds appeared to be agreeable to reason.

To proceed with creating a formula to assimilate starters and closers into one equation, I partitioned the two types of pitchers, and found the correlations. Closers are well represented in the voting practices of the writers, for both leagues.

I was initially surprised to find out team wins, and playoff appearance had a negative correlation to voting points. A playoff appearance cost a player around six points. But my vexation was swiftly addressed and pacified through recall of some of the past winners. Call it the Roy Halladay effect, or concomitantly, the Lincecum effect.

More in line with reason, ERA and WAR were highly correlated to the trends of the voters, with starting pitchers and closers. As well, saves were treated as consequential with closers, and player wins with starting pitchers.

However, that is basically it. Unexpectedly, strikeouts, strikeout ratio, and strikeout/walk ratio created very little substance in the evaluation of positive or negative relationships. FIP and other advanced pitching metrics are yet seen to be invidious creations of new age sabremetricians, and the old habits of the writers persist. (Though with the recent influx of some prominent sabremetricians into the BBWAA, hard-core baseball statistics and advanced metrics may usurp the incapacity of conventional numbers)

Now what is left is a painstaking gap to fill, that of saves and wins. How can the two reconcile to prescribe to the limits of sample size?

Left to the devices of my own ingenuity, it didn’t take but seven to ten minutes of concerted thought to find resolve. And using the fundamentals of the simple formula concocted from the MVP regression, demonstrated above, I merely jutted the basics until my humble vanity was content.

Wins (W), Saves (SV), solving for SVW (Saves to Wins)

SVW = (sqrt{SV}/2)^2+W

Closer wins are putatively rudiments of blown saves, so more or less that is a neutral statistic. One major difference, on this occasion, I decided the conclusion should be instantiated to at least some end and with rationale. And I retrofitted the formula to a formidable interval of average wins. I simply averaged the wins of starters, and then found averaged wins with closers after implementing the formula. I did this for both leagues

AL:
SP average wins = 18.18
CP average wins = 15.87

NL:
SP average wins = 17.83
CP average wins = 14.23

The re-adjustment of saves to wins naturally combats the asymmetry between SP and CP ERA, therefore the weight of the coefficients are equalized. And to further validate the insertion of the formula, Eric Gagne remained the 2003 winner upon applying post-regression numbers.

The invention left me with a sense of great satisfaction, of which I do not hesitate to admit. The arbitrary formula, pragmatically, is rather banal outside the spectrum of this particular regression. Its a meaningless scale that is not afforded qualities of general utility in evaluating players. WAR and WPA are sufficient for comparison across subsets of positions.

Here are regression coefficients and odds.

American League

      Source |       SS       df       MS              Number of obs =      66
-------------+------------------------------           F(  3,    63) =   47.04
       Model |  165575.187     3  55191.7289           Prob > F      =  0.0000
    Residual |  73919.2691    63  1173.32173           R-squared     =  0.6914
-------------+------------------------------           Adj R-squared =  0.6767
       Total |  239494.456    66  3628.70387           Root MSE      =  34.254

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         war |   7.365279   2.725402     2.70   0.009     1.918998    12.81156
         era |  -29.23094   5.821598    -5.02   0.000    -40.86448   -17.59741
         svw |      5.355   1.420173     3.77   0.000     2.517011     8.19299
------------------------------------------------------------------------------

Cy Young Odds:

NAME G WINS LOSSES K/9 BB/9 ERA SV PROB ODDS
David Price 32 21 7 8.4 3.8 3.03 0 21.37% 3.6/1
Jon Lester 33 18 10 9.2 3.1 3.12 0 17.75% 4.6/1
CC Sabathia 34 21 7 6.9 3 3.45 0 15.65% 5.3/1
Clay Buchholz 27 18 7 6.1 3.4 2.99 0 15.53% 5.4/1
Trevor Cahill 29 17 7 5.2 2.7 2.87 0 13.77% 6.2/1
Carl Pavano 33 21 10 5.2 1.6 3.95 0 13.27% 6.5/1
Mariano Rivera 60 4 3 7.7 1.5 1.98 33 11.97% 7.3/1
Jered Weaver 34 15 10 10 2.3 3.21 0 11.53% 7.6/1
Rafael Soriano 65 3 1 7.5 1.9 2.32 47 10.19% 8.8/1
John Danks 32 16 11 7 2.8 3.51 0 10.03% 8.9/1
Joakim Soria 67 0 3 9.7 2.4 2.05 45 9.29% 9.7/1
Cliff Lee 29 14 8 7.8 0.5 3.45 0 7.55% 12.2/1
Jonathan Papelbon 67 5 7 8.2 3.6 2.57 39 6.38% 14.6/1
Francisco Liriano 32 15 10 9.8 2.8 3.54 0 5.96% 15.7/1
Andrew Bailey 53 1 4 6.5 2.5 1.62 28 5.84% 16.1/1
Justin Verlander 33 18 10 8.4 3.3 3.81 0 4.80% 19.8/1
Jeff Niemann 30 14 4 6.5 2.8 3.29 0 4.52% 21.1/1
C.J. Wilson 33 15 7 7 4.2 3.68 0 4.18% 22.9/1
Felix Hernandez 35 11 14 8.2 2.5 2.99 0 3.92% 24.4/1
Brian Duensing 60 8 1 5 2.2 2.19 0 1.93% 50.9/1
Alexi Ogando 35 4 1 8.4 3.7 1.01 0 1.77% 55.6/1
Andy Pettitte 25 15 3 7 3 3.70 0 1.42% 69.4/1
Phil Hughes 30 19 7 7.4 2.5 4.01 0 1.39% 70.8/1

National League

Regression results:

      Source |       SS       df       MS              Number of obs =      64
-------------+------------------------------           F(  3,    61) =   49.35
       Model |  210353.074     3  70117.6914           Prob > F      =  0.0000
    Residual |  86668.9258    61  1420.80206           R-squared     =  0.7082
-------------+------------------------------           Adj R-squared =  0.6939
       Total |      297022    64  4640.96875           Root MSE      =  37.694

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         war |   9.735335   2.944883     3.31   0.002     3.846677    15.62399
         svw |   6.053746   1.591225     3.80   0.000     2.871894    9.235598
         era |   -37.1176   7.397198    -5.02   0.000    -51.90921   -22.32599
------------------------------------------------------------------------------

Cy Young Odds:

NAME G WINS LOSSES K/9 BB/9 ERA SV WAR PROB ODDS
Adam Wainwright 35 24 8 8.1 2.1 2.51 0 7.54 33.97% 1.9/1
Ubaldo Jimenez 33 24 4 8.4 3.6 3.00 0 7.20 27.68% 2.6/1
Roy Halladay 35 21 11 8.2 1 3.01 0 8.45 26.30% 2.8/1
Tim Hudson 33 19 7 4.9 3 3.03 0 7.55 21.14% 3.7/1
Josh Johnson 33 14 7 8.8 2.2 2.65 0 7.89 17.00% 4.8/1
Mat Latos 30 17 7 8.9 2.6 2.45 0 4.71 15.17% 5.5/1
Billy Wagner 73 8 3 12.5 2.6 2.19 40 2.47 14.45% 5.9/1
Heath Bell 71 7 0 11.5 3.7 2.55 48 2.35 11.94% 7.3/1
Johan Santana 34 14 8 6.4 2.7 3.04 0 5.90 7.60% 12.1/1
Chris Carpenter 36 18 6 7.1 2.5 3.52 0 4.75 6.99% 13.3/1
Brian Wilson 68 4 1 12.6 3.4 2.67 45 3.13 6.61% 14.1/1
Yovani Gallardo 31 15 7 9.9 3.6 3.13 0 4.08 3.90% 24.6/1
Jaime Garcia 31 14 7 7.1 3.6 2.78 0 3.21 3.47% 27.7/1
Jonny Venters 76 5 0 9.6 4 1.19 1 2.20 3.43% 28.1/1
John Axford 48 10 1 10.4 4 2.90 23 1.63 0.29% 347.4/1
Clayton Kershaw 33 14 10 9.4 3.8 3.21 0 3.81 0.06% 1764.1/1

At this point there is nothing to compare the odds to. They haven’t been released offshore or in Vegas. At length I’ll revisit with some thoughts on the matter.

, , , , , , , , , , , , , , ,

No Comments

Refining NL and AL MVP Odds

My previous efforts demanded much attention. I made a careless mistake with WAR, and neglected to project for the remainder of the season. It does alter the position of the candidates slightly. The distribution of odds was an amateur effort on my part as well, and I have since refined the process. Now the table is far more agreeable.

Additionally, I decided to add more arbitrary measures to regressing batting average. I regressed all the candidates to .285 over 400 ABs, then for reasons unfounded, I regressed that number to the their per year average rate (AB and AVG).

, , , , , , , , , , , , , , , , , , , , , , ,

No Comments

MVP Odds for Baseball

Any eight year old baseball fanatic can isolate the handful of players in each league that are likely to win the MVP.  And with a high degree of certainty, the winner being inevitably chosen from that handful of players is justifiably expected.  Popularity is obviously a major indicator of likely MVP consideration.  But I have a deep suspicion in any line of reasoning derived from untested opinion or visceral assumptions, rather than some form of empiricism.  This calls for a regression model, using the voting trends from the last ten years (2000-2009).

Qualitative measures are hard to analyze. Let’s use Carlos Gonzales as an example. Playing with the mid-market profile Rockies, qualitative points are by default removed simply because of the perceived Coor’s Field offensive inflation factor, as well as not being in a major market. His stats may be considered deceiving, and for him to have a realistic chance, even though the raw numbers would indicate evidence to the contrary, the Rockies will probably have to win the NL West. Immediately this brings forth a comparison to Matt Holiday, who also played for the Rockies, Wild-Card winner in 2007, the year Holiday finished 2nd in the MVP, despite leading the league in total bases, RBI, batting average, doubles, and tied for 2nd in the league in OPS as well as a top 5 WAR. The eventual winner, Jimmy Rollins, playing for the major market Philadelphia Phillies, had a underwhelming OPS+ (119), a WAR rated three spots lower than Holiday, and a ballpark that had yet to reach its current consensus status of bandbox.

Not saying Rollins was undeserving, but just creating a proxy for the categorical particulates that may dim the chances of certain players receiving the votes necessary to win.

Certainly there are other intangibles that can manifest themselves spontaneously, and at times erroneously, because ultimately the votes accumulated for the MVP award is attributed to the subjectivity of the writers.

To expound further in that respect, a meta-analysis, which would involve an overall abstract of each player’s popularity level and his respective team’s status in relation to the configured mindset of the writers responsible for the voting process, could be broken down into various components that enables a measure of quantification. But that would take too long.

The goal here is to search for value, in order to accomplish this, creating odds to compare to the actual odds is the best approach. Then evaluating the players from a qualitative standpoint. Which I won’t address here.  But as the season reaches its end I will update the prevailing consensus concerning the likely top MVP candidates, and adjust my odds accordingly.

A multivariate analysis using Stata, a program that operates with a sharp understanding on the benefits of parsimonious exertion, allows for a flexible and facilitating process.

I found the correlations between each crucial stat line and the overall voting points, which is what decides the winner of the MVP. Of course, I separated the American and National League, each having their separate yet similar MVP selection process.

I stripped pitcher’s from the MVP equation, and will revisit the pitchers later when I assess the Cy Young odds.

The variables used were team wins, WAR, batting average, home runs, RBI, runs, stolen bases.  These were chosen after some trial and error, but the inherent value of the variables used can certainly reconcile with logic.  WAR may be considered redundant in terms of information content, as well as redundant in compatibility with the other player variables.  However, WAR has its advantages because of the all-inclusive nature of the statistic: Position adjustments, defense, sophisticated offense metrics.  It is an overall measure of player viability that has yet to hit mainstream and a faction of writers may not even attempt to recognize or consider it as part of their voting practices.  Since the regression coalesces to the tendencies of the voters by way of voting points, the most common statistics must be included, and redundancies are to be expected.  Runs and HRs are not mutually exclusive variables, neither are Home runs and Slugging, or even Home runs and Batting Average.  All the preceding events happen simultaneously.

One more thing before I proceeded with both leagues, I created an arbitrary though sufficient way of invoking the playoff variable.  Teams primed for a playoff position were given an extra weight in wins, for team wins have a positive, albeit stunningly slight, correlation to voting points.  The weight was calculated by adding the square root of projected team wins to itself.  So a team on pace for 100 wins would be credited with 110 if they are likely to make the playoffs, and conversely, a team would not be weighted if they had a projection of 90 wins with no post-season prospects.  Its a sliding scale.  Again, arbitrary, but after messing around with the correlations, this seemed to be a solid method of including the post-season factor.

American League

Here is the correlation matrix for the American League (‘twx’ indicates adjusted team wins , everything else is labeled appropriately):

             |  votepts      war      twx       hr      rbi        r       ba      obp      ops      slg       sb
-------------+---------------------------------------------------------------------------------------------------
     votepts |   1.0000
         war |   0.5355   1.0000
         twx |   0.2800   0.0654   1.0000
          hr |   0.3809   0.2404  -0.0676   1.0000
         rbi |   0.4573   0.2614  -0.0265   0.8230   1.0000
           r |   0.4026   0.5326   0.0392   0.2707   0.3019   1.0000
          ba |   0.3341   0.3763  -0.0767  -0.2653  -0.0843   0.1744   1.0000
         obp |   0.3946   0.5200   0.0610   0.2222   0.1924   0.1843   0.5332   1.0000
         ops |   0.4841   0.4874  -0.0397   0.6781   0.5803   0.2022   0.3519   0.8006   1.0000
         slg |   0.4560   0.3952  -0.0863   0.8119   0.6937   0.1798   0.2019   0.5726   0.9496   1.0000
          sb |  -0.0891   0.1202   0.0456  -0.4882  -0.5237   0.3247   0.0853  -0.1977  -0.4473  -0.5093   1.0000

Stolen bases having a negative correlation may seem surprising, but at this point I feel it may be appropriate to refer to Gould’s ‘bio-mechanical limit’ theory to dignify. However, I won’t rhapsodize further (or even at all), for fear of frivolous in-articulation, as well as a mind-hurtling digression that is completely unnecessary.

The coefficients:

      Source |       SS       df       MS              Number of obs =     212
-------------+------------------------------           F(  5,   206) =   48.97
       Model |  1058497.82     5  211699.563           Prob > F      =  0.0000
    Residual |  890627.995   206  4323.43687           R-squared     =  0.5431
-------------+------------------------------           Adj R-squared =  0.5320
       Total |  1949125.81   211  9237.56309           Root MSE      =  65.753

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         twx |   2.060374   .3189046     6.46   0.000     1.431638    2.689109
         war |   11.34832   3.032244     3.74   0.000     5.370106    17.32653
          hr |   2.330891    .661193     3.53   0.001     1.027318    3.634464
          ba |   1362.608   226.1278     6.03   0.000     916.7862    1808.429
          rp |   1.278251   .4477948     2.85   0.005     .3954025    2.161099
       _cons |  -782.4358   79.27622    -9.87   0.000    -938.7326   -626.1391
------------------------------------------------------------------------------

The variable ‘rp’ indicates Runs Produced, which is simply {{RBI + R} / 2}. The voting tendency of the AL MVP, being an offensively geared league as compared to the NL, sees a leaning towards an inclusion of runs scored as well as RBI. For ocularity I just inserted Runs produced into the process, and running the two regressions onto voting points result in virtually identical descriptive statistics.

Now with the given data above its easier to construct a formidable scale of probability. Then later the scale can be leveraged to the general nature of the players involved (what team they play for, etc…).

Here are the odds of the AL MVP, using players that registered voting points produced by the above coefficients and the current season stats. Current in this case means current projection/pace. All statistics were purely flat projections, except for Batting Average which was regressed to a .285 over the remaining number of likely ABs for each player (.285 based solely on my preference, applies to NL as well).

NAME WAR TWX HR RBI AVG R PRED POINTS PROB ODDS
J. Hamilton 7.18 103 34 110 0.338 106 174.85 13% 6.5/1
R. Cano 7.82 111 29 101 0.316 108 141.9 11% 8.3/1
A. Beltre 6.59 102 29 111 0.319 86 115.97 9% 10.3/1
M. Cabrera 6.84 78 37 131 0.325 108 112.24 8% 10.7/1
P. Konerko 5.12 100 40 111 0.297 91 80.94 6% 15.3/1
N. Swisher 5.3 111 31 95 0.294 99 80.56 6% 15.4/1
K. Youkilis 5.77 102 27 89 0.301 111 74.38 6% 16.7/1
M. Teixeira 5.03 111 36 121 0.264 116 72.37 5% 17.2/1
J. Bautista 5.17 85 50 123 0.265 108 66.85 5% 18.7/1
C. Crawford 4.34 108 17 84 0.294 113 61.35 5% 20.5/1
E. Longoria 7.26 108 21 101 0.286 101 57.85 4% 21.8/1
V. Guerrero 1.9 103 30 122 0.293 94 57.8 4% 21.8/1
D. Young 0.83 101 20 119 0.310 77 46.5 4% 27.4/1
J. Mauer 6.10 101 10 90 0.315 96 42.83 3% 29.8/1
M. Young 3.38 103 24 91 0.290 109 37.99 3% 33.7/1
A. Rios 3.18 100 24 93 0.295 93 32.14 2% 40.1/1
D. Ortiz 3.16 102 36 108 0.269 89 30.34 2% 42.5/1
A. Rodriguez 3.77 111 26 129 0.267 77 25.9 2% 50/1
D. Jeter 1.81 111 13 73 0.282 118 8.96 1% 146.5/1

Once the real odds are released I’ll re-address the table and strip away any players that are unlikely to win. A brief survey and one can identify about half a dozen that have a very slim chance of being considered. The odds will be adjusted accordingly.

National League

There is a systematic infection of irreducibility that arises after analyzing the NL data corresponding to Barry Bonds. I removed Bonds for obvious reasons, he skews the entire process. After removing Bonds I re-allocated the voting points to the other candidates, with some reason and adequacy in the end, but didn’t put too much thought into it so it could have been better.

The rest of the process was the same as the AL. For the NL, there was a small difference in the variables that were selected as a result of the most optimal regression, and they are apparent by observing the results below. In both leagues, the four major variables (batting average, RBI, HR, wins) provide a much expected symbiotic relationship with the distribution of voting points.

The NL correlation matrix:

             |  votepts      twx      war       hr      rbi        r       ba      ops      obp      slg       sb
-------------+---------------------------------------------------------------------------------------------------
     votepts |   1.0000
         twx |   0.2844   1.0000
         war |   0.5730   0.1194   1.0000
          hr |   0.5342  -0.1629   0.3935   1.0000
         rbi |   0.5187  -0.1536   0.3689   0.8242   1.0000
           r |   0.4161  -0.1116   0.5578   0.4451   0.4117   1.0000
          ba |   0.2885  -0.0937   0.4192  -0.0546   0.0701   0.1110   1.0000
         ops |   0.5379  -0.1334   0.5907   0.6584   0.5781   0.2957   0.6023   1.0000
         obp |   0.3718  -0.0757   0.5785   0.2745   0.2970   0.2247   0.6984   0.8366   1.0000
         slg |   0.5562  -0.1467   0.5269   0.7705   0.6488   0.2956   0.4834   0.9629   0.6579   1.0000
          sb |  -0.0993  -0.0047  -0.0051  -0.3482  -0.3985   0.2915  -0.1109  -0.3776  -0.2489  -0.3972   1.0000

Here are the regression results:

      Source |       SS       df       MS              Number of obs =     216
-------------+------------------------------           F(  6,   209) =   54.96
       Model |  1696461.17     6  282743.528           Prob > F      =  0.0000
    Residual |  1075285.46   209  5144.90649           R-squared     =  0.6121
-------------+------------------------------           Adj R-squared =  0.6009
       Total |  2771746.63   215  12891.8448           Root MSE      =  71.728

------------------------------------------------------------------------------
     votepts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         twx |   3.310143     .38732     8.55   0.000     2.546588    4.073697
         war |   8.539037   3.239998     2.64   0.009     2.151772     14.9263
          hr |   4.478822   .8344577     5.37   0.000     2.833789    6.123855
          ba |   1313.972   239.9688     5.48   0.000     840.9021    1787.041
         rbi |   1.012889    .395473     2.56   0.011     .2332621    1.792517
          sb |   1.402283   .4044013     3.47   0.001     .6050547    2.199512
       _cons |  -916.8518   91.12105   -10.06   0.000    -1096.486   -737.2176
------------------------------------------------------------------------------

NL MVP Odds:

NAME WAR TWX HR RBI AVG R SB PRED POINTS PROB ODDS
A. Pujols 6.34 101.98 39 118 0.303 105 15 180.1 21% 4.4/1
J. Votto 6.23 100.3 40 109 0.310 114 11 177.82 21% 4.5/1
C. Gonzalez 4.61 93.72 36 111 0.312 110 26 116.29 14% 7.4/1
M. Holliday 5.49 101.98 30 100 0.301 96 10 108.97 13% 7.9/1
A. Gonzalez 7.04 106.46 31 103 0.295 93 0 107.62 13% 8/1
R. Howard 2.79 99.65 33 116 0.290 93 1 82.13 10% 10.9/1
A. Huff 6.86 100.94 28 94 0.296 105 7 76.82 9% 11.7/1
J. Werth 5.31 99.65 22 84 0.299 98 11 43.96 5% 21.2/1
D. Uggla 4.23 81 36 101 0.284 112 4 34.23 4% 27.5/1
M. Prado 3.35 104.1 19 62 0.306 108 6 19.55 2% 48.9/1
A. Dunn 4.43 69.83 43 109 0.276 92 0 14.07 1% 68.4/1
D. Wright 4.19 81.7 24 108 0.287 82 22 13.67 1% 70.4/1
C. Hart 3.71 74.77 33 108 0.288 85 9 2.08 0% 467.9/1

I gave teams that are in playoff/non-playoff contiguity the benefit of the doubt in both leagues (i.e. Red Sox, Rockies). If value is to be found it can only be found in a scenario that enables the most strict possible calculation of probability.

The spectrum of calculating odds is tethered to the number of players considered as candidates. With time and some employment of elementary baseball logic, comes a higher semblance of accuracy. Once the season progresses I can strive for more precision.

I’ll revisit as players vault in and out of MVP candidacy.

, , , , , , , , , , , , , , , , , , , , , , , , , ,

2 Comments

MVP and Cy Young Odds Evaluation Process

I decided to take a look at the MVP and Cy Young  odds.  So in doing so, I felt it appropriate to utilize the excel macro framework uncovered in my college football and subsequently line movement survey posts.  Both required some form of data extraction process, using different sites but implementing similar programming methodology.

The imminent site for baseball awards and statistics is, as one knows, baseball-reference.

I haven’t analyzed the numbers yet, but thought it beneficial to share the extraction code for those inclined to undergo some form of related odds evaluation process.

Here is the excel workbook with 2000-2009 data already in place with sheets labeled according to the date.  Ten years of data seems optimal enough for me.  From my standpoint, further experiments in finding fair-value odds on these MVP and Cy Young futures would require including the future odds themselves.

MVP+CyYoung

At length I’ll post the results.

, , , , , , , , , , , , , , , , , , , , , , , ,

No Comments