从本质上讲,像防弹油箱,我想我的程序absord 404错误,并不断滚动,破碎的interwebs,留下尸体死亡,在其身后bludied,或者,W / E。
我不断收到此错误:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://en.wikipedia.org/wiki/Hudson+Township+%28disambiguation%29
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:29)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Wikipedia_Disambig_Fetcher.all_possibilities(Wikipedia_Disambig_Fetcher.java:38)
at Q.Wikidata_Q_Reader.getQ(Wikidata_Q_Reader.java:54)
at Q.Runner.main(Runner.java:35)
但我不明白为什么,因为我检查,看看我是否有一个有效的URL之前,我找到它。 我的检查过程什么是不正确的?
我试图研究关于这一问题的其他堆栈溢出的问题,但他们不是很权威,再加上我实现了很多的解决方案,从这个和这个 ,迄今没有奏效。
我使用Apache公地URL验证,这是我一直在使用最新的代码:
//get it's normal wiki disambig page
String URL_check = "https://en.wikipedia.org/wiki/" + associated_alias;
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_check ) )
{
Document docx = Jsoup.connect( URL_check ).get();
//this can handle the less structured ones.
和
//check the validity of the URL
String URL_czech = "https://www.wikidata.org/wiki/Special:ItemByTitle?site=en&page=" + associated_alias + "&submit=Search";
UrlValidator urlValidator = new UrlValidator();
if ( urlValidator.isValid( URL_czech ) )
{
URL wikidata_page = new URL( URL_czech );
URLConnection wiki_connection = wikidata_page.openConnection();
BufferedReader wiki_data_pagecontent = new BufferedReader(
new InputStreamReader(
wiki_connection.getInputStream()));