I have a question relating to HTML parsing. I have a website with some products and I would like to catch text within page into my current spreadsheet. This spreadsheet is quite big but contains ItemNbr in 3rd column, I expect the text in the 14th column and one row corresponds to one product (item).
My idea is to fetch the 'Material' on the webpage which is inside the Innertext after tag. The id number changes from one page to page (sometimes ).
Here is the structure of the website:
<div style="position:relative;">
<div></div>
<table id="list-table" width="100%" tabindex="1" cellspacing="0" cellpadding="0" border="0" role="grid" aria-multiselectable="false" aria-labelledby="gbox_list-table" class="ui-jqgrid-btable" style="width: 930px;">
<tbody>
<tr class="jqgfirstrow" role="row" style="height:auto">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="1" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="2" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="3" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="4" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="5" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="6" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td ...</td>
<td ...</td>
</tr>
<tr role="row" id="7" tabindex="-1" class="ui-widget-content jqgrow ui-row-ltr">
<td role="gridcell" style="padding-left:10px" title="Material" aria-describedby="list-table_">Material</td>
<td role="gridcell" style="" title="600D polyester." aria-describedby="list-table_">600D polyester.</td>
</tr>
<tr ...>
</tr>
</tbody>
</table> </div>
I would like to get "600D Polyester" as a result.
My (not working) code snippet is as is:
Sub ParseMaterial()
Dim Cell As Integer
Dim ItemNbr As String
Dim AElement As Object
Dim AElements As IHTMLElementCollection
Dim IE As MSXML2.XMLHTTP60
Set IE = New MSXML2.XMLHTTP60
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLBody As MSHTML.HTMLBody
Set HTMLDoc = New MSHTML.HTMLDocument
Set HTMLBody = HTMLDoc.body
For Cell = 1 To 5 'I iterate through the file row by row
ItemNbr = Cells(Cell, 3).Value 'ItemNbr isin the 3rd Column of my spreadsheet
IE.Open "GET", "http://www.example.com/?item=" & ItemNbr, False
IE.send
While IE.ReadyState <> 4
DoEvents
Wend
HTMLBody.innerHTML = IE.responseText
Set AElements = HTMLDoc.getElementById("list-table").getElementsByTagName("tr")
For Each AElement In AElements
If AElement.Title = "Material" Then
Cells(Cell, 14) = AElement.nextNode.value 'I write the material in the 14th column
End If
Next AElement
Application.Wait (Now + TimeValue("0:00:2"))
Next Cell
Thanks for your help !
Not related to tables or Excel ( I use MS-Access 2013) but directly related to the topic title. My solution is
Just a couple things that hopefully will get you in the right direction:
clean up a bit: remove the readystate property testing loop. The value returned by the readystate property will never change in this context - code will pause after the send instruction, to resume only once the server response is received, or has failed to do so. The readystate property will be set accordingly, and the code will resume execution. You should still test for the ready state, but the loop is just unnecessary
target the right HTML elements: you are searching through the tr elements - while the logic of how you use these elements in your code actually looks to point to td elements
make sure the properties are actually available for the objects you are using them on: to help you with this, try and declare all your variable as specific objects instead of the generic Object. This will activate intellisense. If you have a difficult time finding the actual name of your object as defined in the relevant library in a first place, declare it as the generic Object, run your code, and then inspect the type of the object - by printing typename(your_object) to the debug window for instance. This should put you on your way
I have also included some code below that may help. If you still can't get this to work and you can share your urls - plz do that.
EDIT: as a follow-up to the further information you provided in the comment below - and the additionnal comments I have added
EDIT2: an alternative automating IE