The National League was easier to analyze than the American League. By that I mean the information baseball-reference has on voting points since 2000 creates a more manageable data-set, largely due to the lack of NL pitchers that have received MVP consideration. There have only been five from 2000 to 2010. Thus, before I hadn’t included pitchers in the NL MVP Predictor.
To find the voting points, the formula basically resembles something similar to:
Where the coefficients a, b, and c are found by regressing voting points onto a number of different variables (doesn’t have to be just three) which appear to be statistically significant. This is based on the preference of the user, but whatever combination resolves the most variance in voting points is desirable. In this case R2 = .59. The typical MVP winner earns around 250-300 voting points.
To find the probability, take each individual player whose voting points registers as positive and divide it by half the total number of voting points for all players. Obviously if a player receives over 50% of the voting points then they will win the MVP 100% of the time. Expressed in mathematical form:
Historical voting trends may soon be rendered insignificant as the new generation of sabermetrics becomes the prevailing form of player assessment. Yet to make such an assumption would not only be a general statement on my unlikely ability to gauge future voter temperament, but would also be devastating to my entire MVP Predictor. And I would assume such self-mutilating reflections are unknown to standard issue practices of bloggers, therefore lets assume these assumptions were never assumed.
Like I said before, the regression excluded pitchers, so I developed an arbitrary formula of which the goal was to align the calculation of voting points with a reasonable MVP ranking after one makes a brief survey of the tabulation. I came up with this:
“Playoffs” is a binary variable, and the points added is either 30 or 0.
The formula appears to work.
NL MVP Top 15:
I think it safe to say either Braun or Kemp will win the MVP. And if the eventual NL Cy Young winner has any influence on the MVP, then Ryan Braun is going to be your probable winner. I find it hard to believe a third place team will have both the Cy Young award winner the MVP winner on the same team. The joint probability of a mediocre team having also the two best players in the league is probably very low. This way of thinking perhaps might not be justifiable, if Kemp and Kershaw are the most deserving of the respective awards, why should any other factors come into play? Again the table above merely displays an eleven year voting trend and nothing more.
Factors contributing to the variance of previous MVP awards were more contingent on team success, judging by the significance placed on the playoff variable (For hitters the coefficient is 78.3). But in the “MoneyBall” era, where many teams are more concerned with actually buying wins as opposed to getting stars, such a concept may spill over to the MVP voting process. If it does, than “valuable” simply means value to your team, regardless of how good the team is, and “valuable” as well as “the best” has been simplified into one all-encompassing stat, WAR. Fittingly, both leagues have a “most valuable” player on a mediocre team, if value is decided by WAR, and without that player (Kemp in NL, Bautista in AL), their respective teams would “lose” more wins. Does that make them the most “valuable?”