Apache Nutch error: Injector: java.io.IOException:

2019-08-07 12:47发布

问题:

I am using Apache Nutch 1.14 on Windows 10 having java 1.8. I have followed the same steps as mentioned on https://wiki.apache.org/nutch/NutchTutorial.

When I try to inject the URLs in crawldb using the command on cygwin : bin/nutch inject crawl/crawldb urls

I get the following error: Injector: java.io.IOException: (null) entry in command string: null chmod 0644 E:\apache-nutch-1.4\runtime\local\crawl\crawldb.locked at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)

I checked the logs and found this:

2018-01-18 10:55:26,785 ERROR util.Shell - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

I have searched for this error on several pages but none was of help.

回答1:

  1. make new directory in windows e.g c:\winutil.
  2. inside winutil create bin directory
  3. open https://minhaskamal.github.io/DownGit/#/home
  4. paste https://github.com/steveloughran/winutils/tree/master/hadoop-2.8.1 in above website, and download the winutil-hadoop2.8.1
  5. extract the zip content in c:\winutil\bin
  6. add HADOOP_HOME variable to your system variable and make it refer to c:\winutil
  7. re-run your crawl command in cygin