Grabbing Pinnacle Lines with Excel and DOM


Don’t know why I haven’t explained scraping through DOM and XML before. Actually for some reason I was under the impression I had posted this before, its actually been saved as a draft on my wordpress admin page for a few months now, I guess I forgot to finish.

DOM (Document Object Model) is one of the many HTML/XML to programming language routines that use a common toolbox for which to call its properties.  Fortunately for us, Visual Basic, Java, PHP, etc… all integrate these toolbox routines in order to grab and manipulate information given a particular XML/HTML source.  For a general information scraper, its one of the more expansive and user-friendly, though with simplicity comes limitations.  However, such limitations pose very little concern for extracting basic information from a website.  DOM doesn’t necessarily promote dynamic streaming information parsing, because DOM typically requires the document source to be loaded in complete form into memory.  If you think of DOM as a branching tree, with each element a branch, or “child”, or its root, or “parent”, then its obvious why the information must be in some way stored in memory before the various DOM elements can be called.

DOM can be more useful in extracting box scores or season statistics.  Such data is in a complete state and does not require any other web and programming integration to grab information that is changing, as opposed to line movement or live game updates.

Below is the code for extracting NCAA lines and matchups via Pinnacle’s XML feed using Excel.  Excel’s VBA Modules allows these DOM properties to be implemented, after referencing the relevant toolbox routines.  So before accessing the DOM methods, create a reference to Microsoft XML, v6.0

Pinnacle’s full XML feed is located here, and they are kind enough to offer some assistance on how to get different lines from the different sports offered.

The URL parameters are given above, so to access solely the lines for NCAA football, use the following parameters:

http://xml.pinnaclesports.com/pinnaclefeed.aspx?sporttype=Football&sportsubtype=NCAA

XML files are configured, as mentioned before, as subsequent branches from a tree, whose root is the document source. The DOM properties to access each branch are in a rather English familial vernacular, to make it human-friendly. For example, the following XML sequence:

<game>
<date>10/23</date>
<hometeam>USC</hometeam>
<roadeam>Oregon</roadteam>
</game>

Contains the “game” element, or node, as the parent to its three children, or child node, “date”, “hometeam”, and “roadteam”. The latter three are designated as siblings.

Each element in the document is numbered sequentially from an initial value of zero. Therefore to grab just the home team from above, its DOM properties translate to the second child node, childnode(1), of the parent node “game”.

The VB code is honestly rather self explanatory from there. Some trial and error and a google search or two is needed in order to find the right properties and naming conventions to fit the programming language.  Excel is user-friendly enough to provide all the properties available for the particular function required.

The lines:

    Do Until xmlLoad.readyState = 4
    Loop

Simply means do not continue until the document is completely loaded into memory (readyState = 4).

Remove the comment from the final line to run the program every two minutes.

The final result should look something like this:

Share
  1. #1 by imabev on July 25, 2011 - 9:10 am

    Nice article and good stuff on this blog as well. With BetJamaica totally changing their format my sports odds source that runs my website dried up. I am going to be testing the pinnacle xml scrape today…I need to grab all of the info from the xml but your code gives me some good ideas.

  2. #2 by KMc101217 on October 31, 2012 - 3:45 pm

    Hi mate,

    Nice article.

    I am a complete VBA programming virgin, and am having a problem with the code above. I’m wondering if you could help?

    I am getting a “Compile Error: Syntax Error” for the following two lines:

    Sheets(“Sheet1″).Cells(Rows.Count, 3).End(xlUp)(2, 1) = events.LastChild.FirstChild.SelectSingleNode(“spread”).FirstChild.Text & ” ” & events.LastChild.FirstChild.SelectSingleNode(“spread”).ChildNodes(1).Text
    Sheets(“Sheet1″).Cells(Rows.Count, 5).End(xlUp)(2, 1) = events.LastChild.FirstChild.SelectSingleNode(“spread”).ChildNodes(2).Text & ” ” & events.LastChild.FirstChild.SelectSingleNode(“spread”).ChildNodes(3).Text

    Any idea what the problem may be?

  3. #3 by Sirio on March 20, 2013 - 7:25 am

    Buenos dias, desde Venezuela! Muchas gracias.. Me sirvio tu ejemplo!

You must be logged in to post a comment.