Parsing: Can I pick up the URL of embedded CSS Bac

2019-08-13 13:34发布

The HTML I am parsing contains images with inline CSS in a table, can I use Nokogiri to determine the URL component is, here is a snippet of code I'd like to parse:

tldr: i'ld like to get the .png in this html snippet using nokogiri

<table border="0" cellspacing="0" cellpadding="0" width="300" height="300" background="http://s3.amazonaws.com/static.example.com/sale/homepage/3166-300x300-1328107072.png" style="background-image:url('http://s3.amazonaws.com/static.example.com/sale/homepage/3166-300x300-1328107072.png');background-repeat:no-repeat;background-color:#cacaca">
<tbody><tr>
<td>
<table background="http://s3.amazonaws.com/static.example.com/relaunch/transparent-strip1_1x1.png" style="background-image:url('http://s3.amazonaws.com/static.example.com/relaunch/transparent-strip1_1x1.png');background-repeat:repeat;background-color:transparent" border="0" cellpadding="0" cellspacing="0">
<tbody><tr>
<td style="vertical-align:middle" width="260" height="60">
<span style="font-family:Arial,Helvetica,sans-serif;font-size:13px;padding:2px 5px 0 10px;font-weight:bold;display:block;color:#ffffff">Kristins Gifts</span>
<span style="font-family:Arial,Helvetica,sans-serif;font-size:12px;padding:2px 5px 0 10px;line-height:16px;display:block;color:#ffffff">Stationery to Explore</span>
</td>
</tr>
</tbody></table>
</td>
<td>
<table background="http://s3.amazonaws.com/static.example.com/relaunch/transparent-strip1_1x1.png" style="background-image:url('http://s3.amazonaws.com/static.example.com/relaunch/transparent-strip1_1x1.png');background-repeat:repeat;background-color:transparent" border="0" cellpadding="0" cellspacing="0">
<tbody><tr>
<td style="vertical-align:top;text-align:right" width="50" height="60">
<span style="display:block;padding:18px 16px 0 0"><a href="http://mailer.example.com/clzh.7n1p/Ty4bBi0W_QUigx74Be7d5" alt="Stationery to Explore" title="Stationery to Explore" style="display:inline-block;outline:none" target="_blank"><img src="http://s3.amazonaws.com/static.example.com/relaunch/sales-arrow-button.png" alt=" &gt; " height="23" width="23" style="border:0"></a></span>
</td>
</tr>
</tbody></table>
</td>
</tr>
<tr>
<td colspan="2" height="240">
<a href="http://mailer.example.com/clzh.7n1p/Ty4bBi0W_QUigx74C5096" alt="Stationery to Explore" title="Stationery to Explore" style="width:100%;min-height:240px;display:block;outline:none" target="_blank"></a>
</td>
</tr>
</tbody></table>

1条回答
倾城 Initia
2楼-- · 2019-08-13 14:41

In this case you don't have to look at the CSS, you can pull the image right out of the background attributes on the <table> nodes:

>> doc = Nokogiri::HTML(html)
>> doc.css('table').each { |n| puts n[:background] }
http://s3.amazonaws.com/static.example.com/sale/homepage/3166-300x300-1328107072.png
http://s3.amazonaws.com/static.example.com/relaunch/transparent-strip1_1x1.png
http://s3.amazonaws.com/static.example.com/relaunch/transparent-strip1_1x1.png
查看更多
登录 后发表回答