Web page already open (in source format); just nee

2019-09-08 07:39发布

问题:

Let's say I have a tab already open in the browswer. Its URL is:

view-source:http://www.google.com/webhp?source=search_app

Now that it's already open and displayed, I just want to read the text that's in the client window. (Get a context to the page, or obtain its object (as opposed to creating a new browser object), or whatever. Then just read the page.)

Is there any methodology in Selenium, Splinter that allows for that? Thanks for any help.

回答1:

If you are asking if you can attach to an already open browser, then I believe the answer is "No".



回答2:

You can get the Source of the page directly with Selenium: WebDriver.getPageSource().

But if you use view-source:url the browser will present you a html-page including the formatted source. Firefox e.g. is wrapping each line in a <span id="lineX"></span>. Instead of parsing this just use getPageSource without view-source.

Please read carefully the documentation of getPageSource:

Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.



回答3:

This is what I used to do :

  1. Ask selenium to open a browser
  2. Show a popup/message window to pause execution
  3. Open the URL in the browser and perform all the related operations manually
  4. When I'm done (i.e. on the target page), I click OK on the popup and then the code resumes, extracting/doing the tasks we want on the target page opened currently in the browser.