The Overround
![]() |
![]() |
Archive for category MLB
Updated NL/AL MVP Predictor
Posted by Rufio Magillicutty in Featurific, MLB, MVP on August 26, 2010
Some slight changes in MVP position and predicted number points. The odds still have yet to be released offshore. Pujols has hit a devastating surge, he could win the triple crown. Still think Adrian Gonzalez has the best value in either league.
Keep in mind in both leagues, the players listed are all the players that registered in the predictor equation. Once some players are filtered from the equation the odds will drop. This is because my formula to create odds is simply the total number of predicted points divided by two. Logically, if you have over half the number of points one would be declared the winner
NL and AL Cy Young
Posted by Rufio Magillicutty in Cy Young, Featurific, Futures, MLB on August 18, 2010
The requisites for establishing a reasonable prediction of the likely Cy Young candidates were similar to what I did with the MVP. Of course different statistics needed to be applied, but the essence of the framework remained the same. Find numbers of consequence and regress.
But it doesn’t take long for one to notice, after an inventory of the candidates of years prior, that the appropriation of voting points to starters as well as closers permits the process to be negotiated via arbitrary re-configuration of statistics. This means, since closers are measured by accumulating saves, an equalizer of saves and wins serves the purpose of adjoining closers and starters into the same regression. Relievers that are not closers are S-O-L. I can assure you, with a fair amount of certainty, a middle reliever will not win the Cy Young.
The aforementioned requires some degree of mathematical ingenuity. The term arbitrary finds itself useful in describing such a process of ingenuity, in addition to being thrown around in describing similar applications. I find it to be a very flexible word of choice. I don’t think it presumptuous to denote the term as having, uniquely, a universal privilege for relating the vicissitudes of the day. When you walk to some particular location, the steps you take are not so much planned as they are the result of having authority over the steps you take for the convenience of arriving at the desired objective. Therefore the route can be considered arbitrary, subjected to the judgment of the walk-man.
Arbitrary in mathematics refers to a constant with an undetermined value. Therefore a formula can determine the value, and act as the constant itself.
During the MVP regression process, I felt it practical to provide a certain weight to players playing on teams that make the playoffs (since a playoff appearance by a player has a positive correlation to voting points), by adding the square root of wins to itself, producing the formula:
Given team wins (TW), solving for weighted team wins (TWx)

The results served the purpose of adequacy, and I didn’t feel it necessary to experiment further. Obviously the weight will be incommensurate with the reality of the situation, though not beyond the risk of ineffably skewing the odds. I did not much deliberate on the matter simply because the resulting odds appeared to be agreeable to reason.
To proceed with creating a formula to assimilate starters and closers into one equation, I partitioned the two types of pitchers, and found the correlations. Closers are well represented in the voting practices of the writers, for both leagues.
I was initially surprised to find out team wins, and playoff appearance had a negative correlation to voting points. A playoff appearance cost a player around six points. But my vexation was swiftly addressed and pacified through recall of some of the past winners. Call it the Roy Halladay effect, or concomitantly, the Lincecum effect.
More in line with reason, ERA and WAR were highly correlated to the trends of the voters, with starting pitchers and closers. As well, saves were treated as consequential with closers, and player wins with starting pitchers.
However, that is basically it. Unexpectedly, strikeouts, strikeout ratio, and strikeout/walk ratio created very little substance in the evaluation of positive or negative relationships. FIP and other advanced pitching metrics are yet seen to be invidious creations of new age sabremetricians, and the old habits of the writers persist. (Though with the recent influx of some prominent sabremetricians into the BBWAA, hard-core baseball statistics and advanced metrics may usurp the incapacity of conventional numbers)
Now what is left is a painstaking gap to fill, that of saves and wins. How can the two reconcile to prescribe to the limits of sample size?
Left to the devices of my own ingenuity, it didn’t take but seven to ten minutes of concerted thought to find resolve. And using the fundamentals of the simple formula concocted from the MVP regression, demonstrated above, I merely jutted the basics until my humble vanity was content.
Wins (W), Saves (SV), solving for SVW (Saves to Wins)

Closer wins are putatively rudiments of blown saves, so more or less that is a neutral statistic. One major difference, on this occasion, I decided the conclusion should be instantiated to at least some end and with rationale. And I retrofitted the formula to a formidable interval of average wins. I simply averaged the wins of starters, and then found averaged wins with closers after implementing the formula. I did this for both leagues
AL:
SP average wins = 18.18
CP average wins = 15.87
NL:
SP average wins = 17.83
CP average wins = 14.23
The re-adjustment of saves to wins naturally combats the asymmetry between SP and CP ERA, therefore the weight of the coefficients are equalized. And to further validate the insertion of the formula, Eric Gagne remained the 2003 winner upon applying post-regression numbers.
The invention left me with a sense of great satisfaction, of which I do not hesitate to admit. The arbitrary formula, pragmatically, is rather banal outside the spectrum of this particular regression. Its a meaningless scale that is not afforded qualities of general utility in evaluating players. WAR and WPA are sufficient for comparison across subsets of positions.
Here are regression coefficients and odds.
American League
Source | SS df MS Number of obs = 66
-------------+------------------------------ F( 3, 63) = 47.04
Model | 165575.187 3 55191.7289 Prob > F = 0.0000
Residual | 73919.2691 63 1173.32173 R-squared = 0.6914
-------------+------------------------------ Adj R-squared = 0.6767
Total | 239494.456 66 3628.70387 Root MSE = 34.254
------------------------------------------------------------------------------
votepts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
war | 7.365279 2.725402 2.70 0.009 1.918998 12.81156
era | -29.23094 5.821598 -5.02 0.000 -40.86448 -17.59741
svw | 5.355 1.420173 3.77 0.000 2.517011 8.19299
------------------------------------------------------------------------------
Cy Young Odds:
| NAME | G | WINS | LOSSES | K/9 | BB/9 | ERA | SV | PROB | ODDS |
|---|---|---|---|---|---|---|---|---|---|
| David Price | 32 | 21 | 7 | 8.4 | 3.8 | 3.03 | 0 | 21.37% | 3.6/1 |
| Jon Lester | 33 | 18 | 10 | 9.2 | 3.1 | 3.12 | 0 | 17.75% | 4.6/1 |
| CC Sabathia | 34 | 21 | 7 | 6.9 | 3 | 3.45 | 0 | 15.65% | 5.3/1 |
| Clay Buchholz | 27 | 18 | 7 | 6.1 | 3.4 | 2.99 | 0 | 15.53% | 5.4/1 |
| Trevor Cahill | 29 | 17 | 7 | 5.2 | 2.7 | 2.87 | 0 | 13.77% | 6.2/1 |
| Carl Pavano | 33 | 21 | 10 | 5.2 | 1.6 | 3.95 | 0 | 13.27% | 6.5/1 |
| Mariano Rivera | 60 | 4 | 3 | 7.7 | 1.5 | 1.98 | 33 | 11.97% | 7.3/1 |
| Jered Weaver | 34 | 15 | 10 | 10 | 2.3 | 3.21 | 0 | 11.53% | 7.6/1 |
| Rafael Soriano | 65 | 3 | 1 | 7.5 | 1.9 | 2.32 | 47 | 10.19% | 8.8/1 |
| John Danks | 32 | 16 | 11 | 7 | 2.8 | 3.51 | 0 | 10.03% | 8.9/1 |
| Joakim Soria | 67 | 0 | 3 | 9.7 | 2.4 | 2.05 | 45 | 9.29% | 9.7/1 |
| Cliff Lee | 29 | 14 | 8 | 7.8 | 0.5 | 3.45 | 0 | 7.55% | 12.2/1 |
| Jonathan Papelbon | 67 | 5 | 7 | 8.2 | 3.6 | 2.57 | 39 | 6.38% | 14.6/1 |
| Francisco Liriano | 32 | 15 | 10 | 9.8 | 2.8 | 3.54 | 0 | 5.96% | 15.7/1 |
| Andrew Bailey | 53 | 1 | 4 | 6.5 | 2.5 | 1.62 | 28 | 5.84% | 16.1/1 |
| Justin Verlander | 33 | 18 | 10 | 8.4 | 3.3 | 3.81 | 0 | 4.80% | 19.8/1 |
| Jeff Niemann | 30 | 14 | 4 | 6.5 | 2.8 | 3.29 | 0 | 4.52% | 21.1/1 |
| C.J. Wilson | 33 | 15 | 7 | 7 | 4.2 | 3.68 | 0 | 4.18% | 22.9/1 |
| Felix Hernandez | 35 | 11 | 14 | 8.2 | 2.5 | 2.99 | 0 | 3.92% | 24.4/1 |
| Brian Duensing | 60 | 8 | 1 | 5 | 2.2 | 2.19 | 0 | 1.93% | 50.9/1 |
| Alexi Ogando | 35 | 4 | 1 | 8.4 | 3.7 | 1.01 | 0 | 1.77% | 55.6/1 |
| Andy Pettitte | 25 | 15 | 3 | 7 | 3 | 3.70 | 0 | 1.42% | 69.4/1 |
| Phil Hughes | 30 | 19 | 7 | 7.4 | 2.5 | 4.01 | 0 | 1.39% | 70.8/1 |
National League
Regression results:
Source | SS df MS Number of obs = 64
-------------+------------------------------ F( 3, 61) = 49.35
Model | 210353.074 3 70117.6914 Prob > F = 0.0000
Residual | 86668.9258 61 1420.80206 R-squared = 0.7082
-------------+------------------------------ Adj R-squared = 0.6939
Total | 297022 64 4640.96875 Root MSE = 37.694
------------------------------------------------------------------------------
votepts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
war | 9.735335 2.944883 3.31 0.002 3.846677 15.62399
svw | 6.053746 1.591225 3.80 0.000 2.871894 9.235598
era | -37.1176 7.397198 -5.02 0.000 -51.90921 -22.32599
------------------------------------------------------------------------------
Cy Young Odds:
| NAME | G | WINS | LOSSES | K/9 | BB/9 | ERA | SV | WAR | PROB | ODDS |
|---|---|---|---|---|---|---|---|---|---|---|
| Adam Wainwright | 35 | 24 | 8 | 8.1 | 2.1 | 2.51 | 0 | 7.54 | 33.97% | 1.9/1 |
| Ubaldo Jimenez | 33 | 24 | 4 | 8.4 | 3.6 | 3.00 | 0 | 7.20 | 27.68% | 2.6/1 |
| Roy Halladay | 35 | 21 | 11 | 8.2 | 1 | 3.01 | 0 | 8.45 | 26.30% | 2.8/1 |
| Tim Hudson | 33 | 19 | 7 | 4.9 | 3 | 3.03 | 0 | 7.55 | 21.14% | 3.7/1 |
| Josh Johnson | 33 | 14 | 7 | 8.8 | 2.2 | 2.65 | 0 | 7.89 | 17.00% | 4.8/1 |
| Mat Latos | 30 | 17 | 7 | 8.9 | 2.6 | 2.45 | 0 | 4.71 | 15.17% | 5.5/1 |
| Billy Wagner | 73 | 8 | 3 | 12.5 | 2.6 | 2.19 | 40 | 2.47 | 14.45% | 5.9/1 |
| Heath Bell | 71 | 7 | 0 | 11.5 | 3.7 | 2.55 | 48 | 2.35 | 11.94% | 7.3/1 |
| Johan Santana | 34 | 14 | 8 | 6.4 | 2.7 | 3.04 | 0 | 5.90 | 7.60% | 12.1/1 |
| Chris Carpenter | 36 | 18 | 6 | 7.1 | 2.5 | 3.52 | 0 | 4.75 | 6.99% | 13.3/1 |
| Brian Wilson | 68 | 4 | 1 | 12.6 | 3.4 | 2.67 | 45 | 3.13 | 6.61% | 14.1/1 |
| Yovani Gallardo | 31 | 15 | 7 | 9.9 | 3.6 | 3.13 | 0 | 4.08 | 3.90% | 24.6/1 |
| Jaime Garcia | 31 | 14 | 7 | 7.1 | 3.6 | 2.78 | 0 | 3.21 | 3.47% | 27.7/1 |
| Jonny Venters | 76 | 5 | 0 | 9.6 | 4 | 1.19 | 1 | 2.20 | 3.43% | 28.1/1 |
| John Axford | 48 | 10 | 1 | 10.4 | 4 | 2.90 | 23 | 1.63 | 0.29% | 347.4/1 |
| Clayton Kershaw | 33 | 14 | 10 | 9.4 | 3.8 | 3.21 | 0 | 3.81 | 0.06% | 1764.1/1 |
At this point there is nothing to compare the odds to. They haven’t been released offshore or in Vegas. At length I’ll revisit with some thoughts on the matter.
Refining NL and AL MVP Odds
Posted by Rufio Magillicutty in Featurific, Futures, MLB, MVP on August 17, 2010
My previous efforts demanded much attention. I made a careless mistake with WAR, and neglected to project for the remainder of the season. It does alter the position of the candidates slightly. The distribution of odds was an amateur effort on my part as well, and I have since refined the process. Now the table is far more agreeable.
Additionally, I decided to add more arbitrary measures to regressing batting average. I regressed all the candidates to .285 over 400 ABs, then for reasons unfounded, I regressed that number to the their per year average rate (AB and AVG).
MVP Odds for Baseball
Posted by Rufio Magillicutty in Betting, Featurific, MLB, MVP on August 15, 2010
Any eight year old baseball fanatic can isolate the handful of players in each league that are likely to win the MVP. And with a high degree of certainty, the winner being inevitably chosen from that handful of players is justifiably expected. Popularity is obviously a major indicator of likely MVP consideration. But I have a deep suspicion in any line of reasoning derived from untested opinion or visceral assumptions, rather than some form of empiricism. This calls for a regression model, using the voting trends from the last ten years (2000-2009).
Qualitative measures are hard to analyze. Let’s use Carlos Gonzales as an example. Playing with the mid-market profile Rockies, qualitative points are by default removed simply because of the perceived Coor’s Field offensive inflation factor, as well as not being in a major market. His stats may be considered deceiving, and for him to have a realistic chance, even though the raw numbers would indicate evidence to the contrary, the Rockies will probably have to win the NL West. Immediately this brings forth a comparison to Matt Holiday, who also played for the Rockies, Wild-Card winner in 2007, the year Holiday finished 2nd in the MVP, despite leading the league in total bases, RBI, batting average, doubles, and tied for 2nd in the league in OPS as well as a top 5 WAR. The eventual winner, Jimmy Rollins, playing for the major market Philadelphia Phillies, had a underwhelming OPS+ (119), a WAR rated three spots lower than Holiday, and a ballpark that had yet to reach its current consensus status of bandbox.
Not saying Rollins was undeserving, but just creating a proxy for the categorical particulates that may dim the chances of certain players receiving the votes necessary to win.
Certainly there are other intangibles that can manifest themselves spontaneously, and at times erroneously, because ultimately the votes accumulated for the MVP award is attributed to the subjectivity of the writers.
To expound further in that respect, a meta-analysis, which would involve an overall abstract of each player’s popularity level and his respective team’s status in relation to the configured mindset of the writers responsible for the voting process, could be broken down into various components that enables a measure of quantification. But that would take too long.
The goal here is to search for value, in order to accomplish this, creating odds to compare to the actual odds is the best approach. Then evaluating the players from a qualitative standpoint. Which I won’t address here. But as the season reaches its end I will update the prevailing consensus concerning the likely top MVP candidates, and adjust my odds accordingly.
A multivariate analysis using Stata, a program that operates with a sharp understanding on the benefits of parsimonious exertion, allows for a flexible and facilitating process.
I found the correlations between each crucial stat line and the overall voting points, which is what decides the winner of the MVP. Of course, I separated the American and National League, each having their separate yet similar MVP selection process.
I stripped pitcher’s from the MVP equation, and will revisit the pitchers later when I assess the Cy Young odds.
The variables used were team wins, WAR, batting average, home runs, RBI, runs, stolen bases. These were chosen after some trial and error, but the inherent value of the variables used can certainly reconcile with logic. WAR may be considered redundant in terms of information content, as well as redundant in compatibility with the other player variables. However, WAR has its advantages because of the all-inclusive nature of the statistic: Position adjustments, defense, sophisticated offense metrics. It is an overall measure of player viability that has yet to hit mainstream and a faction of writers may not even attempt to recognize or consider it as part of their voting practices. Since the regression coalesces to the tendencies of the voters by way of voting points, the most common statistics must be included, and redundancies are to be expected. Runs and HRs are not mutually exclusive variables, neither are Home runs and Slugging, or even Home runs and Batting Average. All the preceding events happen simultaneously.
One more thing before I proceeded with both leagues, I created an arbitrary though sufficient way of invoking the playoff variable. Teams primed for a playoff position were given an extra weight in wins, for team wins have a positive, albeit stunningly slight, correlation to voting points. The weight was calculated by adding the square root of projected team wins to itself. So a team on pace for 100 wins would be credited with 110 if they are likely to make the playoffs, and conversely, a team would not be weighted if they had a projection of 90 wins with no post-season prospects. Its a sliding scale. Again, arbitrary, but after messing around with the correlations, this seemed to be a solid method of including the post-season factor.
American League
Here is the correlation matrix for the American League (‘twx’ indicates adjusted team wins , everything else is labeled appropriately):
| votepts war twx hr rbi r ba obp ops slg sb
-------------+---------------------------------------------------------------------------------------------------
votepts | 1.0000
war | 0.5355 1.0000
twx | 0.2800 0.0654 1.0000
hr | 0.3809 0.2404 -0.0676 1.0000
rbi | 0.4573 0.2614 -0.0265 0.8230 1.0000
r | 0.4026 0.5326 0.0392 0.2707 0.3019 1.0000
ba | 0.3341 0.3763 -0.0767 -0.2653 -0.0843 0.1744 1.0000
obp | 0.3946 0.5200 0.0610 0.2222 0.1924 0.1843 0.5332 1.0000
ops | 0.4841 0.4874 -0.0397 0.6781 0.5803 0.2022 0.3519 0.8006 1.0000
slg | 0.4560 0.3952 -0.0863 0.8119 0.6937 0.1798 0.2019 0.5726 0.9496 1.0000
sb | -0.0891 0.1202 0.0456 -0.4882 -0.5237 0.3247 0.0853 -0.1977 -0.4473 -0.5093 1.0000
Stolen bases having a negative correlation may seem surprising, but at this point I feel it may be appropriate to refer to Gould’s ‘bio-mechanical limit’ theory to dignify. However, I won’t rhapsodize further (or even at all), for fear of frivolous in-articulation, as well as a mind-hurtling digression that is completely unnecessary.
The coefficients:
Source | SS df MS Number of obs = 212
-------------+------------------------------ F( 5, 206) = 48.97
Model | 1058497.82 5 211699.563 Prob > F = 0.0000
Residual | 890627.995 206 4323.43687 R-squared = 0.5431
-------------+------------------------------ Adj R-squared = 0.5320
Total | 1949125.81 211 9237.56309 Root MSE = 65.753
------------------------------------------------------------------------------
votepts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
twx | 2.060374 .3189046 6.46 0.000 1.431638 2.689109
war | 11.34832 3.032244 3.74 0.000 5.370106 17.32653
hr | 2.330891 .661193 3.53 0.001 1.027318 3.634464
ba | 1362.608 226.1278 6.03 0.000 916.7862 1808.429
rp | 1.278251 .4477948 2.85 0.005 .3954025 2.161099
_cons | -782.4358 79.27622 -9.87 0.000 -938.7326 -626.1391
------------------------------------------------------------------------------
The variable ‘rp’ indicates Runs Produced, which is simply
. The voting tendency of the AL MVP, being an offensively geared league as compared to the NL, sees a leaning towards an inclusion of runs scored as well as RBI. For ocularity I just inserted Runs produced into the process, and running the two regressions onto voting points result in virtually identical descriptive statistics.
Now with the given data above its easier to construct a formidable scale of probability. Then later the scale can be leveraged to the general nature of the players involved (what team they play for, etc…).
Here are the odds of the AL MVP, using players that registered voting points produced by the above coefficients and the current season stats. Current in this case means current projection/pace. All statistics were purely flat projections, except for Batting Average which was regressed to a .285 over the remaining number of likely ABs for each player (.285 based solely on my preference, applies to NL as well).
| NAME | WAR | TWX | HR | RBI | AVG | R | PRED POINTS | PROB | ODDS |
| J. Hamilton | 7.18 | 103 | 34 | 110 | 0.338 | 106 | 174.85 | 13% | 6.5/1 |
| R. Cano | 7.82 | 111 | 29 | 101 | 0.316 | 108 | 141.9 | 11% | 8.3/1 |
| A. Beltre | 6.59 | 102 | 29 | 111 | 0.319 | 86 | 115.97 | 9% | 10.3/1 |
| M. Cabrera | 6.84 | 78 | 37 | 131 | 0.325 | 108 | 112.24 | 8% | 10.7/1 |
| P. Konerko | 5.12 | 100 | 40 | 111 | 0.297 | 91 | 80.94 | 6% | 15.3/1 |
| N. Swisher | 5.3 | 111 | 31 | 95 | 0.294 | 99 | 80.56 | 6% | 15.4/1 |
| K. Youkilis | 5.77 | 102 | 27 | 89 | 0.301 | 111 | 74.38 | 6% | 16.7/1 |
| M. Teixeira | 5.03 | 111 | 36 | 121 | 0.264 | 116 | 72.37 | 5% | 17.2/1 |
| J. Bautista | 5.17 | 85 | 50 | 123 | 0.265 | 108 | 66.85 | 5% | 18.7/1 |
| C. Crawford | 4.34 | 108 | 17 | 84 | 0.294 | 113 | 61.35 | 5% | 20.5/1 |
| E. Longoria | 7.26 | 108 | 21 | 101 | 0.286 | 101 | 57.85 | 4% | 21.8/1 |
| V. Guerrero | 1.9 | 103 | 30 | 122 | 0.293 | 94 | 57.8 | 4% | 21.8/1 |
| D. Young | 0.83 | 101 | 20 | 119 | 0.310 | 77 | 46.5 | 4% | 27.4/1 |
| J. Mauer | 6.10 | 101 | 10 | 90 | 0.315 | 96 | 42.83 | 3% | 29.8/1 |
| M. Young | 3.38 | 103 | 24 | 91 | 0.290 | 109 | 37.99 | 3% | 33.7/1 |
| A. Rios | 3.18 | 100 | 24 | 93 | 0.295 | 93 | 32.14 | 2% | 40.1/1 |
| D. Ortiz | 3.16 | 102 | 36 | 108 | 0.269 | 89 | 30.34 | 2% | 42.5/1 |
| A. Rodriguez | 3.77 | 111 | 26 | 129 | 0.267 | 77 | 25.9 | 2% | 50/1 |
| D. Jeter | 1.81 | 111 | 13 | 73 | 0.282 | 118 | 8.96 | 1% | 146.5/1 |
Once the real odds are released I’ll re-address the table and strip away any players that are unlikely to win. A brief survey and one can identify about half a dozen that have a very slim chance of being considered. The odds will be adjusted accordingly.
National League
There is a systematic infection of irreducibility that arises after analyzing the NL data corresponding to Barry Bonds. I removed Bonds for obvious reasons, he skews the entire process. After removing Bonds I re-allocated the voting points to the other candidates, with some reason and adequacy in the end, but didn’t put too much thought into it so it could have been better.
The rest of the process was the same as the AL. For the NL, there was a small difference in the variables that were selected as a result of the most optimal regression, and they are apparent by observing the results below. In both leagues, the four major variables (batting average, RBI, HR, wins) provide a much expected symbiotic relationship with the distribution of voting points.
The NL correlation matrix:
| votepts twx war hr rbi r ba ops obp slg sb
-------------+---------------------------------------------------------------------------------------------------
votepts | 1.0000
twx | 0.2844 1.0000
war | 0.5730 0.1194 1.0000
hr | 0.5342 -0.1629 0.3935 1.0000
rbi | 0.5187 -0.1536 0.3689 0.8242 1.0000
r | 0.4161 -0.1116 0.5578 0.4451 0.4117 1.0000
ba | 0.2885 -0.0937 0.4192 -0.0546 0.0701 0.1110 1.0000
ops | 0.5379 -0.1334 0.5907 0.6584 0.5781 0.2957 0.6023 1.0000
obp | 0.3718 -0.0757 0.5785 0.2745 0.2970 0.2247 0.6984 0.8366 1.0000
slg | 0.5562 -0.1467 0.5269 0.7705 0.6488 0.2956 0.4834 0.9629 0.6579 1.0000
sb | -0.0993 -0.0047 -0.0051 -0.3482 -0.3985 0.2915 -0.1109 -0.3776 -0.2489 -0.3972 1.0000
Here are the regression results:
Source | SS df MS Number of obs = 216
-------------+------------------------------ F( 6, 209) = 54.96
Model | 1696461.17 6 282743.528 Prob > F = 0.0000
Residual | 1075285.46 209 5144.90649 R-squared = 0.6121
-------------+------------------------------ Adj R-squared = 0.6009
Total | 2771746.63 215 12891.8448 Root MSE = 71.728
------------------------------------------------------------------------------
votepts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
twx | 3.310143 .38732 8.55 0.000 2.546588 4.073697
war | 8.539037 3.239998 2.64 0.009 2.151772 14.9263
hr | 4.478822 .8344577 5.37 0.000 2.833789 6.123855
ba | 1313.972 239.9688 5.48 0.000 840.9021 1787.041
rbi | 1.012889 .395473 2.56 0.011 .2332621 1.792517
sb | 1.402283 .4044013 3.47 0.001 .6050547 2.199512
_cons | -916.8518 91.12105 -10.06 0.000 -1096.486 -737.2176
------------------------------------------------------------------------------
NL MVP Odds:
| NAME | WAR | TWX | HR | RBI | AVG | R | SB | PRED POINTS | PROB | ODDS |
| A. Pujols | 6.34 | 101.98 | 39 | 118 | 0.303 | 105 | 15 | 180.1 | 21% | 4.4/1 |
| J. Votto | 6.23 | 100.3 | 40 | 109 | 0.310 | 114 | 11 | 177.82 | 21% | 4.5/1 |
| C. Gonzalez | 4.61 | 93.72 | 36 | 111 | 0.312 | 110 | 26 | 116.29 | 14% | 7.4/1 |
| M. Holliday | 5.49 | 101.98 | 30 | 100 | 0.301 | 96 | 10 | 108.97 | 13% | 7.9/1 |
| A. Gonzalez | 7.04 | 106.46 | 31 | 103 | 0.295 | 93 | 0 | 107.62 | 13% | 8/1 |
| R. Howard | 2.79 | 99.65 | 33 | 116 | 0.290 | 93 | 1 | 82.13 | 10% | 10.9/1 |
| A. Huff | 6.86 | 100.94 | 28 | 94 | 0.296 | 105 | 7 | 76.82 | 9% | 11.7/1 |
| J. Werth | 5.31 | 99.65 | 22 | 84 | 0.299 | 98 | 11 | 43.96 | 5% | 21.2/1 |
| D. Uggla | 4.23 | 81 | 36 | 101 | 0.284 | 112 | 4 | 34.23 | 4% | 27.5/1 |
| M. Prado | 3.35 | 104.1 | 19 | 62 | 0.306 | 108 | 6 | 19.55 | 2% | 48.9/1 |
| A. Dunn | 4.43 | 69.83 | 43 | 109 | 0.276 | 92 | 0 | 14.07 | 1% | 68.4/1 |
| D. Wright | 4.19 | 81.7 | 24 | 108 | 0.287 | 82 | 22 | 13.67 | 1% | 70.4/1 |
| C. Hart | 3.71 | 74.77 | 33 | 108 | 0.288 | 85 | 9 | 2.08 | 0% | 467.9/1 |
I gave teams that are in playoff/non-playoff contiguity the benefit of the doubt in both leagues (i.e. Red Sox, Rockies). If value is to be found it can only be found in a scenario that enables the most strict possible calculation of probability.
The spectrum of calculating odds is tethered to the number of players considered as candidates. With time and some employment of elementary baseball logic, comes a higher semblance of accuracy. Once the season progresses I can strive for more precision.
I’ll revisit as players vault in and out of MVP candidacy.
MVP and Cy Young Odds Evaluation Process
Posted by Rufio Magillicutty in Featurific, MLB, Visual Basic on July 28, 2010
I decided to take a look at the MVP and Cy Young odds. So in doing so, I felt it appropriate to utilize the excel macro framework uncovered in my college football and subsequently line movement survey posts. Both required some form of data extraction process, using different sites but implementing similar programming methodology.
The imminent site for baseball awards and statistics is, as one knows, baseball-reference.
I haven’t analyzed the numbers yet, but thought it beneficial to share the extraction code for those inclined to undergo some form of related odds evaluation process.
Here is the excel workbook with 2000-2009 data already in place with sheets labeled according to the date. Ten years of data seems optimal enough for me. From my standpoint, further experiments in finding fair-value odds on these MVP and Cy Young futures would require including the future odds themselves.
At length I’ll post the results.



Recent Comments