Posts Tagged Lines
NL/AL MVP
Posted by Rufio Magillicutty in MLB, MVP on July 21, 2012
Sportsbooks haven’t convened MVP odds yet because I haven’t posted them myself. This is an obvious observation to anybody that visits this blog on a yearly basis. I think we’d all agree on this. (I use the terms “we’d all” and “nobody in particular” interchangeably).
The formula behind setting a probability on a given player’s chances can be expressed as:
If a player doesn’t register a positive number of MVP points, the variable v, then he is simply ignored. The points are calculated slightly differently in the NL and AL, and the years 2000-2010 were used to fit the data. This has already been explained on multiple occasions.
For AL batters and pitchers:
The “PLAYOFFS” variable is either 1 or 0, and in season playoff projections are essentially current standings.
For all NL batters and pitchers:
The motivation for using WAR and WPA as primary coefficients stemmed from this post, which I found quite interesting.
At the bottom of the post I’ve attached some relevant excel files. I’m not going to post anymore about this (I’ll do Cy Young this weekend and attach the necessary files), there really shouldn’t be any reason for me to have to. I also never want to have to use or look at an excel file ever again. But if I get enough requests via twitter/email/comments I’ll make a dedicated page that updates daily, probably using my own WAR calculations instead of bRef’s mess of drivel, and some server-side scripting.
Last year the formula picked Ryan Braun and Miguel Cabrera. Verlander I think can we all agree should not have won the MVP.
NL MVP
| NAME | Team | bWAR | WPA | PROB | ODDS |
| Andrew McCutchen | PIT | 5.1 | 3.2 | 52.15% | -108 |
| Ryan Braun | MIL | 3.9 | 3 | 33.73% | 196 |
| Joey Votto | CIN | 4.5 | 5.2 | 33.42% | 199 |
| Melky Cabrera | SFG | 3.8 | 2.7 | 18.64% | 436 |
| Johnny Cueto | CIN | 4 | 2 | 15.18% | 559 |
| Carlos Gonzalez | COL | 1.6 | 1.8 | 8.73% | 1045 |
| Carlos Beltran | STL | 2.3 | 1.8 | 6.49% | 1441 |
| Matt Holliday | STL | 3.6 | 2.8 | 6.32% | 1482 |
| Buster Posey | SFG | 2.8 | 1.5 | 5.44% | 1738 |
| Ian Desmond | WSN | 2.3 | 3.5 | 5.41% | 1748 |
| Pedro Alvarez | PIT | 2 | 1.1 | 5.29% | 1790 |
| Jay Bruce | CIN | 1.1 | 0.2 | 4.13% | 2321 |
| Giancarlo Stanton | MIA | 3 | 2.5 | 2.34% | 4174 |
| Ryan Vogelsong | SFG | 2.8 | 2.1 | 2.18% | 4487 |
| Brandon Phillips | CIN | 2.2 | 0.8 | 0.55% | 18082 |
AL MVP
| NAME | TEAM | bWAR | WPA | PROB | ODDS |
| Mike Trout | TBR | 5.3 | 0.5 | 34.66% | 188 |
| Robinson Cano | NYY | 5 | 1.6 | 31.16% | 221 |
| Josh Hamilton | TEX | 3.2 | 1.2 | 22.88% | 337 |
| Adrian Beltre | TEX | 3 | 1.6 | 22.13% | 352 |
| Mark Trumbo | TBR | 3.2 | 0 | 18.48% | 441 |
| Josh Reddick | NYY | 3.8 | 4.2 | 14.96% | 568 |
| Alex Rios | TEX | 2.6 | 1.7 | 14.61% | 584 |
| Miguel Cabrera | DET | 3.5 | 2.2 | 14.42% | 593 |
| David Ortiz | BOS | 2.7 | 2.5 | 6.46% | 1448 |
| Matt Harrison | TEX | 4.1 | 2.5 | 5.95% | 1581 |
| Fernando Rodney | TBR | 1.9 | 2.8 | 5.87% | 1604 |
| Justin Verlander | DET | 5 | 3.1 | 3.51% | 2749 |
| Chris Sale | CHW | 4.7 | 3 | 3.18% | 3045 |
| Edwin Encarnacion | TOR | 3 | 2.5 | 1.72% | 5716 |
Here are the files. The “NLMVP_ODDS” and “ALMVP_ODDS” files require a data refresh and some sorting. Feel free to change the coefficients, I don’t care. Some files may be irrelevant, not sure. I just threw a bunch of seemingly related files in an archive.
How to Build a Line Database
Apologies for not having provided any content lately (my tweets have certainly offended about ten users). I would have wrote this months ago but I didn’t.
Let me preface this futher by saying, assuming one will be building a database on a local web-server, I highly recommend using another computer other than the primary one to run a server. I have an old Toshiba laptop that is running Debian (Debian 6.0 is the latest version) and sits in the back of my closet.
In a previous post I uploaded an Excel file that automatically extracts lines from Pinnacle and inserts into an Access database on open (keep the file open and invoke the “Application.OnTime” VBA function for a reoccurring call, or set up a windows task scheduler event). But that requires Windows, and ideally one would want a solution that can be applied across various operating systems. PHP and MySQL is one such solution. Linux users can simply download apache, php, and mysql from the repository. Windows or MAC users might want to look into downloading XAMPP. PHP is a server-side scripting language, so it operates via some sort of web-server, such as apache (if PHP is unfamiliar, just carefully read the code it and should be pretty straight-foward). And MySQL provides the database structure and query language that can be interfaced with most programming languages. I would also suggest setting up an ssh connection, from one computer on the network to the one running the server.
Here is my SQL table structure configured for baseball lines from Pinnacle (assuming a database has already been created):
CREATE TABLE IF NOT EXISTS `LINES` ( `Date` varchar(55) NOT NULL, `vRot` varchar(5) NOT NULL, `Away` varchar(55) NOT NULL, `vListed` varchar(55) NOT NULL, `vLine` varchar(12) NOT NULL, `vTotal` varchar(12) NOT NULL, `vML` int(11) NOT NULL, `hRot` varchar(5) NOT NULL, `Home` varchar(55) NOT NULL, `hListed` varchar(55) NOT NULL, `hLine` varchar(12) NOT NULL, `hTotal` varchar(12) NOT NULL, `hML` int(11) NOT NULL, `nowTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, UNIQUE KEY `ID` (`Date`,`vRot`,`vListed`,`hRot`,`hListed`,`hML`,`hTotal`,`hLine`,`vML`,`vTotal`,`vLine`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; |
The ‘nowTime’ column automatically tracks the current time on data insert. This table is meant to accomodate those interested in tracking line movement, because Pinnacle’s XML updates everytime there is new information added. To take advantage of this, an intermittent call (Pinnacle allows at least 60 seconds between calls) can be made using whatever fashion is most convenient for the programmer (cron job, delayed loop…). And to avoid redundant database inserts, indexing every column and using the ‘INSERT IGNORE’ sql command is essential.
Again, I’m using PHP, and here is my PHP code to grab MLB lines from Pinnacle and insert them into the above SQL table (my database name is ‘MLB’):
//error_reporting(0); $host='localhost'; $username='USER'; $pswrd='PASS'; $con = mysql_connect($host,$username,$pswrd); if (!$con) { die('Could not connect: ' . mysql_error()); } mysql_select_db("MLB",$con) or die('Error while selecting db'); $xmldoc = new DOMDocument(); $url = 'http://xml.pinnaclesports.com/pinnacleFeed.aspx?sporttype=baseball&sportsubtype=MLB'; $xmldoc->load($url); $doc = $xmldoc->documentElement; $event = $doc->getElementsByTagName("event"); foreach( $event as $ev ) { $ml_v = $ev->getElementsbyTagName("moneyline_visiting")->item(0)->nodeValue; $ml_h = $ev->getElementsbyTagName("moneyline_home")->item(0)->nodeValue; if ($ml_h==""){ continue; } $total_v = $ev->getElementsbyTagName("total_points")->item(0)->nodeValue . " " . $ev->getElementsbyTagName("over_adjust")->item(0)->nodeValue; $total_h = $ev->getElementsbyTagName("total_points")->item(0)->nodeValue . " " . $ev->getElementsbyTagName("under_adjust")->item(0)->nodeValue; $d = $ev->getElementsByTagName("event_datetimeGMT")->item(0)->nodeValue; $teamnames = $ev->getElementsByTagName("participant_name"); $name_v = str_replace("'","",$teamnames->item(0)->nodeValue); $name_h = str_replace("'","",$teamnames->item(1)->nodeValue); $rot = $ev->getElementsByTagName("rotnum"); $rotv = $rot->item(0)->nodeValue; $roth = $rot->item(1)->nodeValue; $pitcher = $ev->getElementsByTagName("pitcher"); $pitch_v = mysql_real_escape_string($pitcher->item(0)->nodeValue); $pitch_h = mysql_real_escape_string($pitcher->item(1)->nodeValue); $spread_v = $ev->getElementsbyTagName("spread_visiting")->item(0)->nodeValue . " " . $ev->getElementsbyTagName("spread_adjust_visiting")->item(0)->nodeValue; $spread_h = $ev->getElementsbyTagName("spread_home")->item(0)->nodeValue . " " . $ev->getElementsbyTagName("spread_adjust_home")->item(0)->nodeValue; $sql = "INSERT IGNORE INTO MLB.LINES (Date,vRot,Away,vListed,vLine,vTotal,vML,hRot,Home,hListed,hLine,hTotal,hML) Values ('$d','$rotv','$name_v','$pitch_v','$spread_v','$total_v','$ml_v','$roth','$name_h','$pitch_h','$spread_h','$total_h','$ml_h')"; $query = mysql_query($sql,$con); if(!$query) { die('Could not insert values: ' . mysql_error()); } } mysql_close($con); |
You can use whatever language you want, some are more comfortable with python, perl, javascript, brainfuck, etc…
What is important is knowing how to access your MySQL database from the script and how to navigate the Pinnacle XML file.
As a paranthetical, I previously mentioned running a cron job. In Windows, one may have to use the task scheduler. In MAC or LINUX, the ability to run a cron job should already be set up, just edit the crontab file. For example, a linux user simply has to type in a terminal:
crontab -e
And add the line:
*/2 * * * * /usr/bin/php path/to/php/file.php
This simply means, every two minutes (“/2″), a php file will be opened by the program “php.”
Now if everything works, we can start to present the lines in a nice HTML table. First, create a PHP file to query the database, grabbing the latest lines for each game listed at Pinnacle, and outputting the information in JSON format. This can be a bit tricky, but here is my solution (after connecting to a database with the name ‘MLB’):
... $sql="SELECT * \n" . "FROM MLB.LINES AS m\n" . "INNER JOIN (\n" . "\n" . "SELECT c.vROT, MAX( c.nowTime ) AS maxtime\n" . "FROM MLB.LINES AS c\n" . "GROUP BY c.vROT\n" . ") AS a ON m.vROT = a.vROT\n" . "AND m.nowTime = a.maxtime WHERE NOW()<=DATE_SUB(m.Date,INTERVAL 4 HOUR)"; $results = mysql_query($sql); while($row=mysql_fetch_assoc($results)){ $array[$i]['D']=$row['Date']; $array[$i]['hRot'] = $row['hRot']; $array[$i]['vRot'] = $row['vRot']; $array[$i]['Away'] = $row['Away']; $array[$i]['Home'] = $row['Home']; $array[$i]['vListed'] = $row['vListed']; $array[$i]['hListed'] = $row['hListed']; $array[$i]['vML'] = $row['vML']; $array[$i]['hML'] = $row['hML']; $array[$i]['vTotal'] = $row['vTotal']; $array[$i]['hTotal'] = $row['hTotal']; $array[$i]['vLine'] = $row['vLine']; $array[$i]['hLine'] = $row['hLine']; $i++; } header('Content-type: application/json'); echo json_encode($array); ... |
The advantage of Jquery is the background calls it can make to another file on a server and at the same time read and parse the information that is queried from that file. The “.getJSON” subroutine makes this possible by making calls every 120 seconds (120000) to the php file “MLB_Pinny.php”:
jQuery(document).ready( function( $ ){ var timeout, d; getPinny(); function getPinny() { $.getJSON('php/MLB_Pinny.php?'+new Date().getTime(), function(json_data){ update = new Date(); $('#update td:first').text('LAST UPDATE: '+update.toString("yyyy-MM-dd h:mm")); $('#today tr:not(:first)').empty(); $('#tomorrow tr:not(:first)').empty(); $.each(json_data, function(i, item){ d = Date.parse(item.D).addHours(-4); cur = (d.getDayName()==Date.today().getDayName()) ? "today" : "tomorrow"; $("#"+cur).append($('<tr><td rowspan="2">' + d.toString("yyyy-MM-dd h:mm") + '</td><td>' + item.vRot + '</td><td class="team">' + item.Away + '</td><td class="pitch">' + item.vListed + '</td><td class="ml">' + item.vML + '</td><td>' + item.vTotal + '</td><td>' + item.vLine + '</td></tr><tr><td>' + item.hRot + '</td><td class="team">' + item.Home + '</td><td class="pitch">' + item.hListed + '</td><td class="ml">' + item.hML + '</td><td>' + item.hTotal + '</td><td>' + item.hLine + '</td></tr>')); }); }); } timeout = setInterval(function() { getPinny() }, 120000); }); |
Mine looks like this:
I have two “tbody” sections, one with ‘id = “today”‘ and the other ‘id = “tomorrow”‘. This should be self-explanatory.
Feel free to add some table enhancements (in this case, a toggle):
$('#today').children('tr:eq(0)').click(function () { $('#today').children('tr:gt(0)').toggle(); }); $('#tomorrow').children('tr:eq(0)').click(function () { $('#tomorrow').children('tr:gt(0)').toggle(); }); |
It would be nice if included was the ability to query a pitcher’s closing lines for each start:
... if(isset($_GET['pitch'])) { $query = str_replace("%",". ",$_GET['pitch']); $sql="SELECT * \n" . "FROM MLB.LINES AS m\n" . "WHERE vLISTED LIKE '%$query%' OR hLISTED LIKE '%$query%'\n" . "ORDER BY nowTime DESC"; } else { header( 'HTTP/1.1 400 Bad Request' ); die('Please use correct paramaters'); } $result = mysql_query($sql,$con) or die ('Error while executing query' . mysql_error() . "\n"); echo '<html><head></head><body> <table border=1><thead><th>Date</th><th>Away</th><th>vListed</th><th>vML</th><th>Home</th><th>hListed</th><th>hML</th></thead><tbody>'; while($row=mysql_fetch_assoc($result)){ if ($d == $row['Date']) continue; if($row['vListed']==$query){ $boldvml = "<strong>".$row['vML']."</strong>"; $boldvnm = "<strong>".$row['vListed']."</strong>"; $boldvtm = "<strong>".$row['Away']."</strong>"; $boldhml = $row['hML']; $boldhnm = $row['hListed']; $boldhtm = $row['Home']; } else { $boldhml = "<strong>".$row['hML']."</strong>"; $boldhnm = "<strong>".$row['hListed']."</strong>"; $boldhtm = "<strong>".$row['Home']."</strong>"; $boldvml = $row['vML']; $boldvnm = $row['vListed']; $boldvtm = $row['Away']; } $d = $row['Date']; echo '<tr><td>'.date("Y-m-d hh:MM",strtotime($row['Date'],'-4hours')).'</td><td>'.$boldvtm.'</td><td>'.$boldvnm.'</td><td>'.$boldvml.'</td><td>'.$boldhtm.'</td><td>'.$boldhnm.'</td><td>'.$boldhml.'</td></tr>'; } echo '</tbody></table></body></html>'; ... |
Additionally, the HTML table needs to have the cell with the starter’s name clickable. Jquery can do this:
$('.pitch').live('click',function() { window.open('php/linedb.php?pitch='+$(this).text().replace(" ","%").replace(".","")); }); |
Occassionally, Pinnacle has a listed starer in the format “F LAST” rather than “F. Last”, this usually occurs when there is a late change in the listed starter or the pitcher is making his/her first start. Hence, there are some minor whitespace and trimming issues that for now seem to be resolved with some of the above code.
Hopefully what all this accomplishes is a personal Pinnacle line service, one that updates every 60+ seconds without having to refresh the browser or re-run a query. One could easily integrate the PHP code for different sports. Obviously, basketball and football do not have listed starters, other than that the PHP code should work fine once pointed to the relevant Pinnacle XML file (or any other sportsbook).
I haven’t updated this in a while, but on my github account there is a “SP-DATABASE” project. More importantly, various PHP and MySQL files are provided that can be used independently of the html front-end, and provide a template to abuse Pinnacle.
NCAA Tourney KP vs Pinny
Posted by Rufio Magillicutty in Betting, NCAAB, Pinnacle on March 15, 2012
Same thing as conference tournaments. SEC Field hit at 3/1 odds, the other four lost. A brief survey of a hypothetical bankroll outcome demonstrated the prodigious and frightening force of the Kelly Criterion and all the emotional turmoil likely to beget its constituency. Flat bettors would have come away in the negative, but with an air of optimism and satisfaction having lingered for hitting a future.
KenPom’s LOG5 predictions are here. If you don’t know what that means, to wit:
LOG5 = (a – a * b)/(a + b – 2 * a * b)
“a” and “b” here are winning percentages. KenPom uses his pythagorean winning percentages calculated by PPP and tempo rather than just points scored for and against, with an exponent of around 12.
(Numbers in each cell represent percentages sans the non-obligatory “%” symbol).
| TOP 5 | |||
| REGION | CHAMP | ||
| Ohio St | 10.54 | Ohio St | 3.55 |
| Mich St | 7.64 | Wisconsin | 2.24 |
| Wisconsin | 6.97 | Mich St | 1.98 |
| Kansas | 6.8 | Kansas | 1.74 |
| Indiana | 3.38 | Indiana | 0.67 |
Mr. Pomeroy “likes” the Big Ten, Pinnacle doesn’t.
| SOUTH | ||||||
| KP | PINNY | KP-P | ||||
| TEAM | REGION | CHAMP | REGION | CHAMP | REGION | CHAMP |
| Kentucky | 47.9 | 19.7 | 47.4 | 27.78 | 0.5 | -8.08 |
| Wichita St | 11.8 | 2.6 | 8.43 | 2.32 | 3.37 | 0.28 |
| Indiana | 9.2 | 1.7 | 5.82 | 1.03 | 3.38 | 0.67 |
| Baylor | 10.9 | 1.7 | 12.08 | 2.82 | -1.18 | -1.12 |
| Duke | 9.5 | 1.7 | 12.08 | 4.8 | -2.58 | -3.1 |
| UNLV | 3 | 0.2 | 3.51 | 0.73 | -0.51 | -0.53 |
| Iowa St. | 1.7 | 0.1 | 1.31 | 0.42 | 0.39 | -0.32 |
| Notre Dame | 1.9 | 0.1 | 1.96 | 0.44 | -0.06 | -0.34 |
| Uconn | 0.9 | 0.06 | 2.58 | 1.07 | -1.68 | -1.01 |
| Xavier | 0.09 | 0.04 | 1.32 | 0.43 | -1.23 | -0.39 |
| S Dakota St. | 0.8 | 0.03 | 0.41 | 0.29 | 0.39 | -0.26 |
| VCU | 0.5 | 0.02 | 0.79 | 0.29 | -0.29 | -0.27 |
| Colorado | 0.4 | 0.01 | 0.67 | 0.29 | -0.27 | -0.28 |
| NMSU | 0.3 | 0.01 | 0.41 | 0.35 | -0.11 | -0.34 |
| Lehigh | 0.3 | 0.007 | 0.4 | 0.21 | -0.1 | -0.203 |
| WKY | 0.001 | 0.82 | 0.32 | -0.819 | -0.32 | |
| MIDWEST | ||||||
| KP | PINNY | KP-P | ||||
| TEAM | REGION | CHAMP | REGION | CHAMP | REGION | CHAMP |
| UNC | 28.5 | 6.6 | 32.95 | 13.64 | -4.45 | -7.04 |
| Kansas | 33.7 | 9.1 | 26.9 | 7.36 | 6.8 | 1.74 |
| Gtown | 9.7 | 1.4 | 7.31 | 1.45 | 2.39 | -0.05 |
| Michigan | 5.7 | 0.5 | 4.57 | 0.88 | 1.13 | -0.38 |
| Temple | 2.3 | 0.1 | 3.92 | 0.64 | -1.62 | -0.54 |
| SDSU | 0.9 | 0.03 | 2.65 | 0.52 | -1.75 | -0.49 |
| St. Mary’s | 1.2 | 0.05 | 2.65 | 0.59 | -1.45 | -0.54 |
| Creighton | 2 | 0.1 | 1.61 | 0.43 | 0.39 | -0.33 |
| Alabama | 3.1 | 0.2 | 2.04 | 0.57 | 1.06 | -0.37 |
| Purdue | 3.9 | 0.3 | 3.92 | 0.73 | -0.02 | -0.43 |
| NC State | 1.5 | 0.07 | 4.57 | 0.73 | -3.07 | -0.66 |
| USF | 0.3 | 0.008 | 0.81 | 0.66 | -0.51 | -0.652 |
| Ohio | 0.5 | 0.01 | 0.81 | 0.29 | -0.31 | -0.28 |
| Belmont | 4 | 0.03 | 3.92 | 0.85 | 0.08 | -0.82 |
| Detroit | 0.07 | 0.54 | 0.21 | -0.47 | -0.21 | |
| Vermont | 0.03 | 0.84 | 0.39 | -0.81 | -0.39 | |
| WEST | ||||||
| KP | PINNY | KP-P | ||||
| TEAM | REGION | CHAMP | REGION | CHAMP | REGION | CHAMP |
| Mich St | 35.2 | 12.4 | 27.56 | 10.42 | 7.64 | 1.98 |
| Missouri | 23.1 | 5.3 | 22.63 | 8.31 | 0.47 | -3.01 |
| Memphis | 8.2 | 1.7 | 5.67 | 1.61 | 2.53 | 0.09 |
| New Mexico | 7.1 | 1 | 7.84 | 1.33 | -0.74 | -0.33 |
| Marquette | 7.5 | 0.9 | 9 | 2.34 | -1.5 | -1.44 |
| Loserville | 4.7 | 0.5 | 9.08 | 2.61 | -4.38 | -2.11 |
| Florida | 4.4 | 0.5 | 3.97 | 0.8 | 0.43 | -0.3 |
| St. Louis | 3.4 | 0.5 | 2.2 | 0.57 | 1.2 | -0.07 |
| Virginia | 2.5 | 0.2 | 1.78 | 0.43 | 0.72 | -0.23 |
| Murray St. | 1.4 | 0.07 | 3.05 | 0.73 | -1.65 | -0.66 |
| LBSU | 1 | 0.06 | 1.3 | 0.29 | -0.3 | -0.23 |
| BYU | 0.5 | 0.02 | 3.91 | 0.97 | -3.41 | -0.95 |
| Davidson | 0.3 | 0.009 | 0.71 | 0.29 | -0.41 | -0.281 |
| Colorado St. | 0.4 | 0.008 | 0.52 | 0.29 | -0.12 | -0.282 |
| LIU | 0.003 | 0.39 | 0.17 | -0.387 | -0.17 | |
| Norfolk St | 0.0001 | 0.39 | 0.21 | -0.3899 | -0.21 | |
| EAST | ||||||
| KP | PINNY | KP-P | ||||
| TEAM | REGION | CHAMP | REGION | CHAMP | REGION | CHAMP |
| Syracuse | 17.5 | 4.4 | 18.22 | 5.72 | -0.72 | -1.32 |
| Ohio St | 45.9 | 19.3 | 35.36 | 15.75 | 10.54 | 3.55 |
| FSU | 3.9 | 0.5 | 9.29 | 4.08 | -5.39 | -3.58 |
| Wisconsin | 16.2 | 4.2 | 9.23 | 1.96 | 6.97 | 2.24 |
| Vanderbilt | 4.9 | 0.8 | 7.92 | 2.81 | -3.02 | -2.01 |
| Cincinnati | 1.8 | 0.2 | 4.39 | 1.03 | -2.59 | -0.83 |
| Gonzaga | 1.7 | 0.1 | 2.4 | 0.59 | -0.7 | -0.49 |
| Kansas St | 3.4 | 0.4 | 4.39 | 0.98 | -0.99 | -0.58 |
| S. Miss | 0.2 | 0.006 | 0.98 | 0.34 | -0.78 | -0.334 |
| WVU | 0.8 | 0.05 | 2.4 | 0.59 | -1.6 | -0.54 |
| Texas | 2.3 | 0.2 | 2.2 | 0.52 | 0.1 | -0.32 |
| Harvard | 0.7 | 0.04 | 1.11 | 0.29 | -0.41 | -0.25 |
| Montana | 0.09 | 0.002 | 0.79 | 0.29 | -0.7 | -0.288 |
| St. Bona | 0.6 | 0.03 | 0.53 | 0.29 | 0.07 | -0.26 |
| Loyola | 0.02 | 0.4 | 0.17 | -0.38 | -0.17 | |
| UNC-Ashe | 0.03 | 0.4 | 0.17 | -0.37 | -0.17 | |
NL MVP Update
Posted by Rufio Magillicutty in Betting, MLB on September 26, 2011
The National League was easier to analyze than the American League. By that I mean the information baseball-reference has on voting points since 2000 creates a more manageable data-set, largely due to the lack of NL pitchers that have received MVP consideration. There have only been five from 2000 to 2010. Thus, before I hadn’t included pitchers in the NL MVP Predictor.
To find the voting points, the formula basically resembles something similar to:


Where the coefficients a, b, and c are found by regressing voting points onto a number of different variables (doesn’t have to be just three) which appear to be statistically significant. This is based on the preference of the user, but whatever combination resolves the most variance in voting points is desirable. In this case R2 = .59. The typical MVP winner earns around 250-300 voting points.
To find the probability, take each individual player whose voting points registers as positive and divide it by half the total number of voting points for all players. Obviously if a player receives over 50% of the voting points then they will win the MVP 100% of the time. Expressed in mathematical form:

Historical voting trends may soon be rendered insignificant as the new generation of sabermetrics becomes the prevailing form of player assessment. Yet to make such an assumption would not only be a general statement on my unlikely ability to gauge future voter temperament, but would also be devastating to my entire MVP Predictor. And I would assume such self-mutilating reflections are unknown to standard issue practices of bloggers, therefore lets assume these assumptions were never assumed.
Like I said before, the regression excluded pitchers, so I developed an arbitrary formula of which the goal was to align the calculation of voting points with a reasonable MVP ranking after one makes a brief survey of the tabulation. I came up with this:

“Playoffs” is a binary variable, and the points added is either 30 or 0.
The formula appears to work.
NL MVP Top 15:
| NAME | Team | bWAR | WPA | PROB | ODDS |
| Ryan Braun | MIL | 7.74 | 6.20 | 30.5% | 227 |
| Matt Kemp | LAD | 9.95 | 6.00 | 25.8% | 287 |
| Prince Fielder | MIL | 4.89 | 7.00 | 21.0% | 377 |
| Justin Upton | ARI | 4.48 | 3.10 | 14.7% | 580 |
| Roy Halladay | PHI | 7.23 | 4.20 | 13.0% | 670 |
| Albert Pujols | STL | 5.71 | 4.70 | 12.6% | 696 |
| Joey Votto | CIN | 6.72 | 7.20 | 12.5% | 700 |
| Cliff Lee | PHI | 6.83 | 4.00 | 10.9% | 814 |
| Ryan Howard | PHI | 2.65 | 4.40 | 10.6% | 841 |
| Hunter Pence | PHI | 4.99 | 2.80 | 8.4% | 1089 |
| Clayton Kershaw | LAD | 7.07 | 3.70 | 8.2% | 1120 |
| Ian Kennedy | ARI | 5.60 | 4.30 | 7.3% | 1270 |
| Cole Hamels | PHI | 5.50 | 4.00 | 5.8% | 1620 |
| Lance Berkman | STL | 4.99 | 5.40 | 5.6% | 1674 |
| Shane Victorino | PHI | 5.09 | 3.10 | 4.9% | 1939 |
I think it safe to say either Braun or Kemp will win the MVP. And if the eventual NL Cy Young winner has any influence on the MVP, then Ryan Braun is going to be your probable winner. I find it hard to believe a third place team will have both the Cy Young award winner the MVP winner on the same team. The joint probability of a mediocre team having also the two best players in the league is probably very low. This way of thinking perhaps might not be justifiable, if Kemp and Kershaw are the most deserving of the respective awards, why should any other factors come into play? Again the table above merely displays an eleven year voting trend and nothing more.
Factors contributing to the variance of previous MVP awards were more contingent on team success, judging by the significance placed on the playoff variable (For hitters the coefficient is 78.3). But in the “MoneyBall” era, where many teams are more concerned with actually buying wins as opposed to getting stars, such a concept may spill over to the MVP voting process. If it does, than “valuable” simply means value to your team, regardless of how good the team is, and “valuable” as well as “the best” has been simplified into one all-encompassing stat, WAR. Fittingly, both leagues have a “most valuable” player on a mediocre team, if value is decided by WAR, and without that player (Kemp in NL, Bautista in AL), their respective teams would “lose” more wins. Does that make them the most “valuable?”
SP Line, WAR, and WPA
Posted by Rufio Magillicutty in Betting, MLB on August 23, 2011
Before I compared the three statistics (Line, WAR, WPA), I wanted to remove as many performance independent factors that go into a pitcher’s average line as I possibly could. There are some things that are just out of the pitcher’s control. A pitcher who started 10 games at home and 6 on the road will have about a 20% advantage in their vegas probability before anything else is taken into account. To adjust for home/road start discrepancy, I just multiplied the difference in home/road starts by .025, took the aggregate line, and divided by number of starts. Since HFA is set at 5%, each pitcher will have an increase or decrease of 2.5% in their line based on where they are pitching. I also had to adjust for opponents faced. This was fairly easy, the information is already in the SP report table, and an average pitcher has a vegas probability of .5. From there the calculation is elementary.
Obviously there are other things that go into line appropriation. One being public perception, which is hard to quantify. Linemakers have a panoply of information for which to draw from I would assume. I wouldn’t be surprised if there are some that keep a database of blink duration for each player, and any peaks or troughs in duration that a player may endure throughout the course of the year. Perhaps there is some relationship between change in blink duration and performance? Its a curious thing, simultaneous blinking. Five percent of our lives are spent walking around with our eyes closed. A sequential blinker may have an advantage in avoiding any impending danger projected, such as a spear or a rock. Why there are no sequential blinkers I don’t know. One would think sequential blinkers would reproduce differentially and would victor in pairwise contests with simultaneous blinkers. Or perhaps not? Maybe the sequential blinking mutation just never occurred. Its possible blinking sequentially is an impossibility, an incite deeply routed in the bilateral symmetry of vertebrates, or the eye protein of all organisms that are motile through a transparent spectrum.
A severe tangent, a devastating yet fascinating ramble. I can say whatever I want its my blog. I would actually be willing to do a research project on the correlation between blink duration and player ability, unfortunately nobody is stupid enough to commission such an important and groundbreaking research project, and I’m not going to do it for free.
The graphs below are actually pretty interesting, as the three statistics measure player ability from three different angles. I extracted the WAR and WPA stats from Fangraphs, using only qualified players to limit any variance and outliers. As expected, the three appear to be highly correlated with one another. WAR measures raw performance, WPA measures situational performance, and SP Line, though enigmatic, can be seen as a measure of public perception. Again, three different angles of assessing player ability. The R value is for all qualified players. Descriptive statistics at this point are limited by sample size but I don’t see any reason why with more data comes a lower proportion of variance that can be explained with the relationship. Especially with what statistics are being looked at here.
The graphs basically have the same topographical qualities, which is interesting because WPA explicitly handles quantifying specific events during the course of a game, and fundamentally does not resolve player ability unlike WAR. However, since most events during the game occur while the run differential is plus or minus three, a player’s statistics will in all likelihood indicate what kind of WPA is to be expected. There are exceptions, of course (cough cough Arod cough cough, it should be noted Arod’s best WPA season was in 2007, finishing first that year in WPA and winning his third AL MVP award, further validating my inclusion of WPA into the MVP odds formula).



Recent Comments