在网页假设我有以下值:
<td> <a href="https://www.test.com/test123/a.html"> test11 </a> </td>
<td> <a href="https://www.test.com/test12333/r.html"> test12 </a> </td>
<td> <a href="https://www.test.com/testaa123/t.html"> test21 </a> </td>
<td> <a href="https://www.test.com/test123123/b.html"> test31 </a> </td>
反正有没有找到该值test21
使用Ruby?
或者,反正是有找到href
具有子值/testaa123/t.html
?
试试这个教程的引入nokogiri。
例如,对于一个<li>
标签:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "http://ruby.bastardsbook.com/files/hello-webpage.html"
page.css('li')[0].text
这将输出的YouTube从下面的网站:
<div id="funstuff">
<p>Here are some entertaining links:</p>
<ul>
<li><a href="http://youtube.com">YouTube</a></li>
<li><a data-category="news" href="http://reddit.com">Reddit</a></li>
<li><a href="http://kathack.com/">Kathack</a></li>
<li><a data-category="news" href="http://www.nytimes.com">New York Times</a></li>
</ul>
</div>