How to save complete html page with frames/iframes

2019-08-09 08:11发布

During the web scraping I want to save current page's html to a file for later debug. browser.html helps in most cases, but when the page contains an iframe/frame, it's content is not returned in browser.html, I have to get it separately with something like browser.iframe.html There are also cases when inside an iframe is another iframe. I can find every frame recursively and save its content, but separated files won't be very useful because I don't know the exact structure of the page.

For example I have the following page:

<!DOCTYPE html>
<html>
<head>
</head>
  <frameset cols="50%,20%,30%">
     <frame name="left" src="/html/left_frame.htm" />
     <frame name="right" src="/html/right_frame.htm" />
     <noframes>
       <body>
          Your browser does not support frames.
       </body>
     </noframes>
     <frame src="http://example.com"/>
  </frameset>
</html>

I want to save it to file using watir. Any ideas?

1条回答
地球回转人心会变
2楼-- · 2019-08-09 09:11

Frames act much like a completely separate web page, and while you can see the content as it appears in the rendered document and the dom, contents of a frame are not technically part of the html for a page. You can see this in the browser, right click the main doc and view html, then compare that to what you get right clicking content that is in a frame and viewing html.

To write all the html out to files, you are likely going to need to make a method that writes out html of a frame, looks for other frames, and calls the same method recursively on any frames found inside.

Alternativly maybe look at a gem like nokogiri that is designed to parse html, it might have better methods for this sort of thing, or existing examples for how to do what you want

查看更多
登录 后发表回答