I am trying to extract prices from this HTML page using the VBA code below:
Here's the HTML snippet:
<div class="box-text box-text-products">
<div class="title-wrapper">
<p class="category uppercase is-smaller no-text-overflow product-cat op-7">
Xikar Lighters
</p>
<p class="name product-title">
<a href="https://www.havanahouse.co.uk/product/xikar-allume-single-jet-flame-racing-cigar-lighter-bluewhite-stripe/">Xikar Allume Single Jet Flame Racing Cigar Lighter – Blue/White Stripe</a>
</p>
</div>
<div class="price-wrapper">
<span class="price">
<del>
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">£</span>48.00
</span>
</del>
<ins>
<span class="woocommerce-Price-amount amount">
<span class="woocommerce-Price-currencySymbol">£</span>45.00
</span>
</ins>
</span>
</div>
</div>
<!-- box-text -->undefined</div>undefined<!-- box -->undefined</div>undefined<!-- .col-inner -->undefined</div>undefined<!-- col -->
I am using the below code but I get an error:
For Each oElement In oHtml.getElementsByClassName("woocommerce-Price-amoun t amount")
If oElement.getElementsByTagName("del") Then Exit For
If oElement.innerText <> 0 Then
Cells(counter, 3) = CDbl(oElement.innerText)
counter = counter + 1
End If
Next oElement
Take a look at the below example:
The output for me is as follows:
Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx. About RegEx: introduction (especially syntax), introduction JS, VB flavor.
BTW there are another answers using the similar approach: 1, 2, 3, 4, 5.