JSoup will not fetch all items?

2020-08-17 17:25发布

问题:

So, I am trying to parse a simple list using JSoup. Unfortunately, the program only returns the entries up til the entries that start with N in the list. I do not know why this is the case. Here is my code:

    public ArrayList<String> initializeMangaNameList(){
        Document doc;
        try {
            doc = Jsoup.connect("http://www.mangahere.com/mangalist/").get();
            Elements items = doc.getElementsByClass("manga_info");
            ArrayList<String> names = new ArrayList<String>();
            for(Element item: items){
                names.add(item.text());
            }
            return names;
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return null;
}

So why does the List not contain all the entries? Is there an error with the webpage? Or perhaps the parser? Can I use a workaround to bypass this issue? And what is causing the issue in the first place?

回答1:

Okay the issued was caused by a change in JSoup version 1.72 and higher. You just need to change the default settings like so:

public ArrayList<String> initializeMangaNameList(){
    Document doc;
    try {
        doc = Jsoup.connect("http://www.mangahere.com/mangalist/").maxBodySize(0).get();
        Elements items = doc.getElementsByClass("manga_info");
        ArrayList<String> names = new ArrayList<String>();
        for(Element item: items){
            names.add(item.text());
        }
        return names;
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return null;

}

The important difference is setting the maxBodySize to 0 so that it allows files of unlimited size. More information can be found in the documentation. That will allow you to have unlimited body size and load all the data you need to.