how to navigate to other pages when pagination exi

2019-08-30 01:37发布

问题:

I have a URL(http://myURL.com) from which I'm reading the content of the webpage. An issue is I can able to read the page1 content only. Using jsoup API when the page2 content is read given the page2 URL of the pagination pages, still, it is showing the content of page1 when printed instead of showing page2 content, but when the page2 URL is opened in the browser it is showing the contents of page2 in a web browser. Any suggestions on how to read the contents of other pages when the pagination occurs?

Original URL :

http://myURL.com/myDocs/forms/AllItems.aspx?RootFolder=%2fsites%2docs3%2fmiscc%20Documents%2fstatus%20yearly%2f2017&FolderCTID=0x012906D46689EQWEPKA

URL of page2 : (After clicking on the next button to see page2 of the pagination pages) :

http://myURL.com/myDocs/forms/AllItems.aspx?RootFolder=%2fsites%2docs3%2fmiscc%20Documents%2fstatus%20yearly%2f2017&FolderCTID=0x012906D46689EQWEPKA #InplviewHash038662ba-180e-41fc-8ad6-8b9805aa1b8b=Paged%3DTRUE-p_SortBehavior%3D0-p_FileLeafRef%3DGM%255fSW%2520TEAM%255fProgram%255fStatus%255f20170821%255fvFNAL%252epdf-p_ID%3D85-PageFirstRow%3D31-RootFolder%3D%252fsites%252fijjhhj3%252fyeal%2520Documents%252fstatus%2520Report%252f2017

java code:

 public class Tester {
        private static final String page1URL = "http://myURL.com/myDocs/forms/AllItems.aspx?RootFolder=%2fsites%2docs3%2fmiscc%20Documents%2fstatus%20yearly%2f2017&FolderCTID=0x012906D46689EQWEPKA";

    private String final String page2URL= "http://myURL.com/myDocs/forms/AllItems.aspx?RootFolder=%2fsites%2docs3%2fmiscc%20Documents%2fstatus%20yearly%2f2017&FolderCTID=0x012906D46689EQWEPKA#InplviewHash038662ba-180e-41fc-8ad6-8b9805aa1b8b=Paged%3DTRUE-p_SortBehavior%3D0-p_FileLeafRef%3DGM%255fSW%2520TEAM%255fProgram%255fStatus%255f20170821%255fvFNAL%252epdf-p_ID%3D85-PageFirstRow%3D31-RootFolder%3D%252fsites%252fijjhhj3%252fyeal%2520Documents%252fstatus%2520Report%252f2017";
      public static void main(String[] args) throws IOException {
            org.jsoup.nodes.Document doc = Jsoup.connect(page1URL).get();
            System.out.println(doc);
    }  }

In the above code, when I pass page2URL also, it is showing the contents of page1 only but when opened in the browser it is showing the page2 contents. Is it because page2URL is the URL occurred when clicked on Next button in page1(pagination)?

ps: page2URL is same as page1URL but with extra appenders (#InplviewHash03....), please compare both URLs to know the difference.

回答1:

I suggest reading up on the meaning of # in an URL. It was originally meant as anchor within a page so that the browser could jump to the display of that element right away. These days it is used for AJAX, because it is possible to read out the parameter via JavaScript. FOr reference see What is the meaning of # in URL and how can i use that?

This means your website contains JavaScript that loads the contents of page 2 after getting the original content via JavaScript. As I explained you before in the question you removed, JSoup will not run JavaScript, so you are still required of identifying the AJAX call and getting the real parameters of that call. When you have this, you can access the contents of page 2.