How to scrape web page data in VBA

  • Follow


I need EXCEL to open a web address and  parse out a few data items from that 
web location.  What EXCEL macro function(s) can be used to open a web URL and 
parse the page source?

The web pages in question are not set up to allow the external date 
functions to find the needed data.
0
Reply Utf 3/31/2010 7:47:01 PM

 Use joels dump sub to ascertain the tag name or ID for the data you want:
with worksheets ("sheet1")
 RowCount = 1  
      for each itm in IE.Document.all
         Range("A" & RowCount) = itm.tagname
         Range("B" & RowCount) = itm.classname
         Range("C" & RowCount) = itm.id       'comment out if errors
         Range("D" & RowCount) = itm.name     'comment out if errors
         Range("E" & RowCount) = left(itm.innertext,1024)
         RowCount = Rowcount + 1
      next itm
end with
once you have this information you can reference the data in  your sub
look at the following piece of zip code sub as an example

Set Form = IE.document.getElementsByTagname("Form")

Set zip5 = IE.document.getElementById("zip5")
zip5.Value = ZIPCODE


Set ZipCodebutton = Form(0).onsubmit

Form(0).submit
Do While IE.busy = True
   DoEvents
Loop

Set Table = IE.document.getElementsByTagname("Table")
Location = Table(0).Rows(2).innertext
IE.Quit
MsgBox ("Zip code = " & ZIPCODE & " City/State = " & Location)

good luck

"Mike" wrote:

> I need EXCEL to open a web address and  parse out a few data items from that 
> web location.  What EXCEL macro function(s) can be used to open a web URL and 
> parse the page source?
> 
> The web pages in question are not set up to allow the external date 
> functions to find the needed data.
0
Reply Utf 3/31/2010 8:48:01 PM


Here is one I used to parse out some data. I never got it working as well as 
I wanted; if you are working with simple pages it shouldn't be too hard, but 
I was trying to parse a page that the designer specifically didn't want 
parsed, and I was having trouble getting Excel to grab 100% of the page 
content for parsing. The code below grabbed the active page and used Regex to 
parse it. If you use Regex you will need to set a reference to it in the VBE.
HTH,
Keith

Sub GetTheData()

Dim strHTML As String
Dim strPartURL As String
Dim strPageHTML As String

strPartURL = "providername"

Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows

If objShellWindows.Count = 0 Then
    Exit Sub
End If

For i = 0 To objShellWindows.Count - 1
    Set objIE = objShellWindows.Item(i)
    If InStr(objIE.LocationURL, strPartURL) Then
        'Creates a TextRange object for the element.
        Set rng = objIE.document.body.createTextRange
'        Set rng2 = objIE.document.body.parentelement
'        Str2 = rng2.Text
        strPageHTML = rng.Text
    End If
Next i

Dim re As RegExp
Dim s As String
Dim ObjID As String
Dim matches As MatchCollection
Dim mcmatch As Match

s1 = strPageHTML
's = Replace(s, Chr(10), Chr(32))
's = Replace(s, Chr(13), Chr(32))
's = Replace(s, "Here to View More Results", "CustomDeletedString")
's = Replace(s, " View ", " ViewView")
Debug.Print s1

'grab the long string to find LSVs
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family[\s\S]*?View"

Set s2 = re.Execute(s)
Debug.Print s2

Set re2 = New RegExp
re.IgnoreCase = True
re.Global = True
re.MultiLine = True
re.Pattern = "Family|View[\s\S]*?View" '(gives full string ) 
"Family|View[\s\S]*?View

Set matches = re.Execute(s)
For Each mcmatch In matches
tempstr = mcmatch
Trim (tempstr)
tempstr = Left(tempstr, Len(tempstr) - 4)
tempstr = Trim(Right(tempstr, Len(tempstr) - 4))
MsgBox (tempstr)
'MsgBox Asc(Mid(tempstr, 2, 1))
Next

Set objShellWindows = Nothing
Set objShell = Nothing
End Sub





"Mike" wrote:

> I need EXCEL to open a web address and  parse out a few data items from that 
> web location.  What EXCEL macro function(s) can be used to open a web URL and 
> parse the page source?
> 
> The web pages in question are not set up to allow the external date 
> functions to find the needed data.
0
Reply Utf 3/31/2010 10:36:52 PM

On Mar 31, 1:47=A0pm, Mike <M...@discussions.microsoft.com> wrote:
> I need EXCEL to open a web address and =A0parse out a few data items from=
 that
> web location. =A0What EXCEL macro function(s) can be used to open a web U=
RL and
> parse the page source?
>
> The web pages in question are not set up to allow the external date
> functions to find the needed data.

Here's another method.  It doesn't involve opening and closing IE so
it runs faster.  All of the web page information can be extracted from
the source code (using instr, mid, etc) which is contained in
my_var...Ron

Sub Test()
    my_url =3D "http://www.google.com"
    Set my_obj =3D CreateObject("MSXML2.XMLHTTP")
    my_obj.Open "GET", my_url, False
    my_obj.send
    my_var =3D my_obj.responsetext
    Set my_obj =3D Nothing
End Sub
0
Reply ron 3/31/2010 10:53:44 PM

Ron's code is much better than mine if you have a static web address that you 
can poll directly. Mine was more complicated because I had to follow a 
complicated login and navigation process to reach the page I needed to parse. 
If you have the option, use the simpler code :)

"ker_01" wrote:

> Here is one I used to parse out some data. I never got it working as well as 
> I wanted; if you are working with simple pages it shouldn't be too hard, but 
> I was trying to parse a page that the designer specifically didn't want 
> parsed, and I was having trouble getting Excel to grab 100% of the page 
> content for parsing. The code below grabbed the active page and used Regex to 
> parse it. If you use Regex you will need to set a reference to it in the VBE.
> HTH,
> Keith
> 
> Sub GetTheData()
> 
> Dim strHTML As String
> Dim strPartURL As String
> Dim strPageHTML As String
> 
> strPartURL = "providername"
> 
> Set objShell = CreateObject("Shell.Application")
> Set objShellWindows = objShell.Windows
> 
> If objShellWindows.Count = 0 Then
>     Exit Sub
> End If
> 
> For i = 0 To objShellWindows.Count - 1
>     Set objIE = objShellWindows.Item(i)
>     If InStr(objIE.LocationURL, strPartURL) Then
>         'Creates a TextRange object for the element.
>         Set rng = objIE.document.body.createTextRange
> '        Set rng2 = objIE.document.body.parentelement
> '        Str2 = rng2.Text
>         strPageHTML = rng.Text
>     End If
> Next i
> 
> Dim re As RegExp
> Dim s As String
> Dim ObjID As String
> Dim matches As MatchCollection
> Dim mcmatch As Match
> 
> s1 = strPageHTML
> 's = Replace(s, Chr(10), Chr(32))
> 's = Replace(s, Chr(13), Chr(32))
> 's = Replace(s, "Here to View More Results", "CustomDeletedString")
> 's = Replace(s, " View ", " ViewView")
> Debug.Print s1
> 
> 'grab the long string to find LSVs
> Set re = New RegExp
> re.IgnoreCase = True
> re.Global = True
> re.MultiLine = True
> re.Pattern = "Family[\s\S]*?View"
> 
> Set s2 = re.Execute(s)
> Debug.Print s2
> 
> Set re2 = New RegExp
> re.IgnoreCase = True
> re.Global = True
> re.MultiLine = True
> re.Pattern = "Family|View[\s\S]*?View" '(gives full string ) 
> "Family|View[\s\S]*?View
> 
> Set matches = re.Execute(s)
> For Each mcmatch In matches
> tempstr = mcmatch
> Trim (tempstr)
> tempstr = Left(tempstr, Len(tempstr) - 4)
> tempstr = Trim(Right(tempstr, Len(tempstr) - 4))
> MsgBox (tempstr)
> 'MsgBox Asc(Mid(tempstr, 2, 1))
> Next
> 
> Set objShellWindows = Nothing
> Set objShell = Nothing
> End Sub
> 
> 
> 
> 
> 
> "Mike" wrote:
> 
> > I need EXCEL to open a web address and  parse out a few data items from that 
> > web location.  What EXCEL macro function(s) can be used to open a web URL and 
> > parse the page source?
> > 
> > The web pages in question are not set up to allow the external date 
> > functions to find the needed data.
0
Reply Utf 3/31/2010 11:48:01 PM

4 Replies
5073 Views

(page loaded in 0.012 seconds)

Similiar Articles:
















7/23/2012 12:29:51 PM


Reply: