Posts Tagged future

NL MVP Update

The National League was easier to analyze than the American League.  By that I mean the information baseball-reference has on voting points since 2000 creates a more manageable data-set, largely due to the lack of NL pitchers that have received MVP consideration.  There have only been five from 2000 to 2010.  Thus, before I hadn’t included pitchers in the NL MVP Predictor.

To find the voting points, the formula basically resembles something similar to:

ax + by + cz + kappa = V

kappa = some constant

Where the coefficients a, b, and c are found by regressing voting points onto a number of different variables  (doesn’t have to be just three) which appear to be statistically significant.  This is based on the preference of the user, but whatever combination resolves the most variance in voting points is desirable.  In this case R2 = .59. The typical MVP winner earns around 250-300 voting points.

To find the probability, take each individual player whose voting points registers as positive and divide it by half the total number of voting points for all players.  Obviously if a player receives over 50% of the voting points then they will win the MVP 100% of the time.  Expressed in mathematical form:

P_i = {2V_i}/sum{i}{n}{V_i}

Historical voting trends may soon be rendered insignificant as the new generation of sabermetrics becomes the prevailing form of player assessment.  Yet to make such an assumption would not only be a general statement on my unlikely ability to gauge future voter temperament, but would also be devastating to my entire MVP Predictor. And I would assume such self-mutilating reflections are unknown to standard issue practices of bloggers, therefore lets assume these assumptions were never assumed.

Like I said before, the regression excluded pitchers, so I developed an arbitrary formula of which the goal was to align the calculation of voting points with a reasonable MVP ranking after one makes a brief survey of the tabulation.  I came up with this:

10WPA+30Playoffs+25WAR-10ERA-100

“Playoffs” is a binary variable, and the points added is either 30 or 0.

The formula appears to work.

NL MVP Top 15:

NAME Team bWAR WPA PROB ODDS
Ryan Braun MIL 7.74 6.20 30.5% 227
Matt Kemp LAD 9.95 6.00 25.8% 287
Prince Fielder MIL 4.89 7.00 21.0% 377
Justin Upton ARI 4.48 3.10 14.7% 580
Roy Halladay PHI 7.23 4.20 13.0% 670
Albert Pujols STL 5.71 4.70 12.6% 696
Joey Votto CIN 6.72 7.20 12.5% 700
Cliff Lee PHI 6.83 4.00 10.9% 814
Ryan Howard PHI 2.65 4.40 10.6% 841
Hunter Pence PHI 4.99 2.80 8.4% 1089
Clayton Kershaw LAD 7.07 3.70 8.2% 1120
Ian Kennedy ARI 5.60 4.30 7.3% 1270
Cole Hamels PHI 5.50 4.00 5.8% 1620
Lance Berkman STL 4.99 5.40 5.6% 1674
Shane Victorino PHI 5.09 3.10 4.9% 1939

I think it safe to say either Braun or Kemp will win the MVP. And if the eventual NL Cy Young winner has any influence on the MVP, then Ryan Braun is going to be your probable winner. I find it hard to believe a third place team will have both the Cy Young award winner the MVP winner on the same team. The joint probability of a mediocre team having also the two best players in the league is probably very low. This way of thinking perhaps might not be justifiable, if Kemp and Kershaw are the most deserving of the respective awards, why should any other factors come into play? Again the table above merely displays an eleven year voting trend and nothing more.

Factors contributing to the variance of previous MVP awards were more contingent on team success, judging by the significance placed on the playoff variable (For hitters the coefficient is 78.3). But in the “MoneyBall” era, where many teams are more concerned with actually buying wins as opposed to getting stars, such a concept may spill over to the MVP voting process. If it does, than “valuable” simply means value to your team, regardless of how good the team is, and “valuable” as well as “the best” has been simplified into one all-encompassing stat, WAR. Fittingly, both leagues have a “most valuable” player on a mediocre team, if value is decided by WAR, and without that player (Kemp in NL, Bautista in AL), their respective teams would “lose” more wins. Does that make them the most “valuable?”

Share

, , , , , , , , , , , , ,

No Comments

NL Cy Young Update and Ian Kennedy

 

NAME TEAM bWAR PROB ODDS
Clayton Kershaw LAD 7.09 57.28% -134
Roy Halladay PHI 7.57 38.39% 160
Ian Kennedy ARI 5.64 36.12% 177
Cliff Lee PHI 7.15 25.36% 294
Craig Kimbrel ATL 3.14 15.67% 538
Tim Lincecum SFG 4.73 11.46% 773
Cole Hamels PHI 5.47 9.04% 1006
Yovani Gallardo MIL 2.30 3.07% 3155
Daniel Hudson ARI 2.61 2.76% 3519
Tim Hudson ATL 3.55 0.56% 17904
J.J. Putz ARI 1.88 0.29% 34667

The Cy Young is much easier to digest than the ambiguities that embody the MVP award.  The best pitcher in the league wins the Cy Young, and justifying the eventual winner can be analytically reduced to any metric that would warrant “best pitcher” status.  That could be wins, winning percentage, ERA, or sabermetrics.  Out of the top four candidates, I don’t think anybody would be surprised if any one of the four win the Cy Young.  And the voting trends since 2000 incidentally align with considerations implied by the “FanGraphs generation.”  The top four in WAR for National League pitchers also rate in the top four in the probability of winning the Cy Young.  Again this is because factors that go into evaluating pitching using advanced metrics will inevitably lead to results aligning closely with what the “Triple Crown stats” show, and unlike WAR for position players, there is no random dummy variable (UZR, TZR) deciding a player’s performance rating.

With that in mind, at what point did sportsbooks catch on to the type of season Ian Kennedy is having.  A brief survey of his projections before the season, and nowhere do I see 20-4 and an ERA+ of 137 (ZiPS made a valiant effort however, 8-5 125+, and MARCEL had 7-9 106+).

Below I made a graph charting each individual line with his moving average, expressed as a probability, p.

From his first start to about June 5th, his 12th start of the season against Jason Marquis (-160), we saw a steady increase.  Obviously this could be more an expression of the Diamondbacks performance rather than Kennedy’s market value.  But Kennedy’s line compared to his team line is only 1.9% higher.  Here are a few notable lines over his last few starts.

Thus it appears the sportsbooks do not think too highly of neither Kennedy nor the Diamondbacks.  He currently ranks 37th out of 277 pitchers who have been listed on a Vegas card, though when adjusted for home/road start discrepancy and his opponents’ line that ranking moves up six spots to 31st, four spots behind his teammate Daniel Hudson, who has the same average opponent’s line.  The lack of respect for Kennedy explains his monetary intake from a bettors perspective.  The top 10 in units earned on the season for starting pitchers.

RNK PLAYER GS TM W-L UNITS
1 I KENNEDY 32 24-8 16.7
2 J VERLANDER 33 25-8 13.9
3 V WORLEY 20 16-4 11.9
4 Z GREINKE 26 19-7 10.7
5 J MARQUIS 23 15-8 10.3
6 C KERSHAW 32 22-10 9.9
7 I NOVA 26 19-7 9.6
8 J WEAVER 32 22-10 9.4
9 R HALLADAY 31 23-8 8.4
10 J BECKETT 28 20-8 8.3

*From Statfox

Share

, , , , , , , , , , ,

No Comments

NL Cy Young Update

NAME TEAM WAR PROB ODDS
Clayton Kershaw LAD 6.60 46.73% 114
Roy Halladay PHI 7.45 36.98% 170
Cliff Lee PHI 7.21 32.61% 207
Ian Kennedy ARI 5.29 31.61% 216
Craig Kimbrel ATL 3.70 23.85% 319
Cole Hamels PHI 6.27 15.37% 551
Tim Lincecum SFG 4.60 8.71% 1048
Tim Hudson ATL 3.70 2.91% 3341
Daniel Hudson ARI 2.30 1.22% 8083

SP

Voting Points = W*8.71 + K*.14 + WAR*6.72 - ERA*39.87 - 46.80
R2 = .58

RP

Voting Points = WAR*14.72 - WHIP*108.24 + SV*2.19 - 7.95
R2 = .37

These are the formulas that resolve as much of the variance in voting points as I could find. The probabilities of winning seem to line up with reason. Certainly analysis of relief pitchers are limited by the number of observations, relative to starting pitchers. Out of the 77 pitchers that received cy young consideration since 2000, only 14 are relievers. Again I’ve said before, its unrealistic to expect voting behavior to reflect any sort of major statistical trend when access to information and technology increase exponentially every two years.

Share

, , , , , , , , , ,

No Comments

AL Cy Young Update

NAME WAR PROB ODDS
Justin Verlander 8.75 58.65% -142
CC Sabathia 6.48 37.48% 167
Jered Weaver 7.33 29.77% 236
Felix Hernandez 5.72 16.03% 524
Ricky Romero 6.39 13.28% 653
Jon Lester 5.48 13.01% 669
James Shields 5.48 11.74% 752
Josh Beckett 6.67 8.04% 1143
C.J. Wilson 4.46 6.73% 1385
Dan Haren 4.26 3.05% 3180
David Price 4.17 1.92% 5113
Gio Gonzalez 4.73 0.30% 3314
WAR*9.23 + ERA*-21.94 + W*7.14 + K*.25 - 108.5

R2 = .54
Share

, , , , , , , , , , , , ,

No Comments

NL MVP Updated

NAME Team WAR WPA PROB ODDS
Ryan Braun MIL 6.7 4.58 34.65% 188
Matt Kemp LAD 7.8 4.60 28.68% 248
Prince Fielder MIL 4.1 5.48 24.42% 309
Joey Votto CIN 6.1 6.86 19.98% 400
Justin Upton ARI 3.7 2.78 18.10% 452
Shane Victorino PHI 5.1 3.50 12.78% 682
Ryan Howard PHI 2 2.97 11.64% 758
Albert Pujols STL 4.4 3.08 11.40% 776
Troy Tulowitzki COL 5.6 2.35 9.12% 996
Hunter Pence PHI 3.7 2.59 8.74% 1044
Lance Berkman STL 4.1 4.26 7.62% 1211
Michael Bourn ATL 5 1.57 2.97% 3271
Ryan Roberts ARI 3.3 2.21 2.70% 3602
Brian McCann ATL 2.6 1.05 2.34% 4169

Everything was basically covered in the last post. Its a different data set (AL vs NL) so the formula is slightly different. Again I’m regressing voting points against a certain collection of statistics. To calculate the probability one has to consider that the winner has 100% probability of winning at any point where his voting points are equal to or greater than the total voting points, for all players, divided by two.

Share

, , , , , , , , , , , ,

No Comments