Don’t know why I haven’t explained scraping through DOM and XML before. Actually for some reason I was under the impression I had posted this before, its actually been saved as a draft on my wordpress admin page for a few months now, I guess I forgot to finish.
DOM (Document Object Model) is one of the many HTML/XML to programming language routines that use a common toolbox for which to call its properties. Fortunately for us, Visual Basic, Java, PHP, etc… all integrate these toolbox routines in order to grab and manipulate information given a particular XML/HTML source. For a general information scraper, its one of the more expansive and user-friendly, though with simplicity comes limitations. However, such limitations pose very little concern for extracting basic information from a website. DOM doesn’t necessarily promote dynamic streaming information parsing, because DOM typically requires the document source to be loaded in complete form into memory. If you think of DOM as a branching tree, with each element a branch, or “child”, or its root, or “parent”, then its obvious why the information must be in some way stored in memory before the various DOM elements can be called.
DOM can be more useful in extracting box scores or season statistics. Such data is in a complete state and does not require any other web and programming integration to grab information that is changing, as opposed to line movement or live game updates.
Below is the code for extracting NCAA lines and matchups via Pinnacle’s XML feed using Excel. Excel’s VBA Modules allows these DOM properties to be implemented, after referencing the relevant toolbox routines. So before accessing the DOM methods, create a reference to Microsoft XML, v6.0
Pinnacle’s full XML feed is located here, and they are kind enough to offer some assistance on how to get different lines from the different sports offered.
XML files are configured, as mentioned before, as subsequent branches from a tree, whose root is the document source. The DOM properties to access each branch are in a rather English familial vernacular, to make it human-friendly. For example, the following XML sequence:
Contains the “game” element, or node, as the parent to its three children, or child node, “date”, “hometeam”, and “roadteam”. The latter three are designated as siblings.
Each element in the document is numbered sequentially from an initial value of zero. Therefore to grab just the home team from above, its DOM properties translate to the second child node, childnode(1), of the parent node “game”.
The VB code is honestly rather self explanatory from there. Some trial and error and a google search or two is needed in order to find the right properties and naming conventions to fit the programming language. Excel is user-friendly enough to provide all the properties available for the particular function required.
Sub XML_Pinny() Dim xmlLoad As New MSXML2.DOMDocument Dim allevents As IXMLDOMNode Dim eventslen As Integer Dim events As IXMLDOMNode Dim XMLHttpRequest As MSXML2.xmlhttp Dim i As Integer Dim URL As String Application.Calculation = xlCalculationManual Application.ScreenUpdating = False On Error Resume Next URL = "http://xml.pinnaclesports.com/pinnaclefeed.aspx?sporttype=Football&sportsubtype=NCAA" Set XMLHttpRequest = New MSXML2.xmlhttp XMLHttpRequest.Open "GET", URL, False XMLHttpRequest.send Set xmlLoad = New MSXML2.DOMDocument Do Until xmlLoad.readyState = 4 Loop xmlLoad.LoadXML (XMLHttpRequest.responseText) Sheets("Sheet1").Range("A1").Value = xmlLoad.DocumentElement.ChildNodes(0).FirstChild.Text Set allevents = xmlLoad.DocumentElement.ChildNodes(3) eventslen = allevents.ChildNodes.Length Sheets("Sheet1").Range("B1").Value = eventslen i = 0 For i = 0 To (eventslen - 1) Step 1 Set events = allevents.ChildNodes(i) Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp)(2, 1) = events.FirstChild.Text Sheets("Sheet1").Cells(Rows.Count, 2).End(xlUp)(2, 1) = events.ChildNodes(5).ChildNodes(0).FirstChild.Text Sheets("Sheet1").Cells(Rows.Count, 4).End(xlUp)(2, 1) = events.ChildNodes(5).ChildNodes(1).FirstChild.Text Sheets("Sheet1").Cells(Rows.Count, 3).End(xlUp)(2, 1) = events.LastChild.FirstChild.SelectSingleNode("spread").FirstChild.Text & " " & events.LastChild.FirstChild.SelectSingleNode("spread").ChildNodes(1).Text Sheets("Sheet1").Cells(Rows.Count, 5).End(xlUp)(2, 1) = events.LastChild.FirstChild.SelectSingleNode("spread").ChildNodes(2).Text & " " & events.LastChild.FirstChild.SelectSingleNode("spread").ChildNodes(3).Text Next i Set xmlLoad = Nothing Set allevents = Nothing Application.Calculation = xlCalculationAutomatic Application.ScreenUpdating = True 'Application.OnTime Now + TimeValue("00:02:00"), "XML_Pinny" End Sub
Do Until xmlLoad.readyState = 4 Loop
Simply means do not continue until the document is completely loaded into memory (readyState = 4).
Remove the comment from the final line to run the program every two minutes.
The final result should look something like this: