Posts Tagged future
The National League was easier to analyze than the American League. By that I mean the information baseball-reference has on voting points since 2000 creates a more manageable data-set, largely due to the lack of NL pitchers that have received MVP consideration. There have only been five from 2000 to 2010. Thus, before I hadn’t included pitchers in the NL MVP Predictor.
To find the voting points, the formula basically resembles something similar to:
Where the coefficients a, b, and c are found by regressing voting points onto a number of different variables (doesn’t have to be just three) which appear to be statistically significant. This is based on the preference of the user, but whatever combination resolves the most variance in voting points is desirable. In this case R2 = .59. The typical MVP winner earns around 250-300 voting points.
To find the probability, take each individual player whose voting points registers as positive and divide it by half the total number of voting points for all players. Obviously if a player receives over 50% of the voting points then they will win the MVP 100% of the time. Expressed in mathematical form:
Historical voting trends may soon be rendered insignificant as the new generation of sabermetrics becomes the prevailing form of player assessment. Yet to make such an assumption would not only be a general statement on my unlikely ability to gauge future voter temperament, but would also be devastating to my entire MVP Predictor. And I would assume such self-mutilating reflections are unknown to standard issue practices of bloggers, therefore lets assume these assumptions were never assumed.
Like I said before, the regression excluded pitchers, so I developed an arbitrary formula of which the goal was to align the calculation of voting points with a reasonable MVP ranking after one makes a brief survey of the tabulation. I came up with this:
“Playoffs” is a binary variable, and the points added is either 30 or 0.
The formula appears to work.
NL MVP Top 15:
I think it safe to say either Braun or Kemp will win the MVP. And if the eventual NL Cy Young winner has any influence on the MVP, then Ryan Braun is going to be your probable winner. I find it hard to believe a third place team will have both the Cy Young award winner the MVP winner on the same team. The joint probability of a mediocre team having also the two best players in the league is probably very low. This way of thinking perhaps might not be justifiable, if Kemp and Kershaw are the most deserving of the respective awards, why should any other factors come into play? Again the table above merely displays an eleven year voting trend and nothing more.
Factors contributing to the variance of previous MVP awards were more contingent on team success, judging by the significance placed on the playoff variable (For hitters the coefficient is 78.3). But in the “MoneyBall” era, where many teams are more concerned with actually buying wins as opposed to getting stars, such a concept may spill over to the MVP voting process. If it does, than “valuable” simply means value to your team, regardless of how good the team is, and “valuable” as well as “the best” has been simplified into one all-encompassing stat, WAR. Fittingly, both leagues have a “most valuable” player on a mediocre team, if value is decided by WAR, and without that player (Kemp in NL, Bautista in AL), their respective teams would “lose” more wins. Does that make them the most “valuable?”
The Cy Young is much easier to digest than the ambiguities that embody the MVP award. The best pitcher in the league wins the Cy Young, and justifying the eventual winner can be analytically reduced to any metric that would warrant “best pitcher” status. That could be wins, winning percentage, ERA, or sabermetrics. Out of the top four candidates, I don’t think anybody would be surprised if any one of the four win the Cy Young. And the voting trends since 2000 incidentally align with considerations implied by the “FanGraphs generation.” The top four in WAR for National League pitchers also rate in the top four in the probability of winning the Cy Young. Again this is because factors that go into evaluating pitching using advanced metrics will inevitably lead to results aligning closely with what the “Triple Crown stats” show, and unlike WAR for position players, there is no random dummy variable (UZR, TZR) deciding a player’s performance rating.
With that in mind, at what point did sportsbooks catch on to the type of season Ian Kennedy is having. A brief survey of his projections before the season, and nowhere do I see 20-4 and an ERA+ of 137 (ZiPS made a valiant effort however, 8-5 125+, and MARCEL had 7-9 106+).
Below I made a graph charting each individual line with his moving average, expressed as a probability, p.
From his first start to about June 5th, his 12th start of the season against Jason Marquis (-160), we saw a steady increase. Obviously this could be more an expression of the Diamondbacks performance rather than Kennedy’s market value. But Kennedy’s line compared to his team line is only 1.9% higher. Here are a few notable lines over his last few starts.
Thus it appears the sportsbooks do not think too highly of neither Kennedy nor the Diamondbacks. He currently ranks 37th out of 277 pitchers who have been listed on a Vegas card, though when adjusted for home/road start discrepancy and his opponents’ line that ranking moves up six spots to 31st, four spots behind his teammate Daniel Hudson, who has the same average opponent’s line. The lack of respect for Kennedy explains his monetary intake from a bettors perspective. The top 10 in units earned on the season for starting pitchers.
Voting Points = W*8.71 + K*.14 + WAR*6.72 - ERA*39.87 - 46.80
R2 = .58
Voting Points = WAR*14.72 - WHIP*108.24 + SV*2.19 - 7.95
R2 = .37
These are the formulas that resolve as much of the variance in voting points as I could find. The probabilities of winning seem to line up with reason. Certainly analysis of relief pitchers are limited by the number of observations, relative to starting pitchers. Out of the 77 pitchers that received cy young consideration since 2000, only 14 are relievers. Again I’ve said before, its unrealistic to expect voting behavior to reflect any sort of major statistical trend when access to information and technology increase exponentially every two years.
WAR*9.23 + ERA*-21.94 + W*7.14 + K*.25 - 108.5 R2 = .54
Everything was basically covered in the last post. Its a different data set (AL vs NL) so the formula is slightly different. Again I’m regressing voting points against a certain collection of statistics. To calculate the probability one has to consider that the winner has 100% probability of winning at any point where his voting points are equal to or greater than the total voting points, for all players, divided by two.