Is it possible to run Hadoop jobs (like the WordCo

2019-01-20 01:37发布

I have Windows 7, Java 8, Maven and Eclipse. I've created a Maven project and used almost exactly the same code as here.

It's just a simple "word count" sample. I try to launch the "driver" program from Eclipse, I provide command line arguments (the input file and the output directory) and get the following error:

Exception in thread "main" java.lang.NullPointerException   at
java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)    at
org.apache.hadoop.util.Shell.runCommand(Shell.java:404)     at
org.apache.hadoop.util.Shell.run(Shell.java:379)    at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at
org.apache.hadoop.util.Shell.execCommand(Shell.java:678)    at
org.apache.hadoop.util.Shell.execCommand(Shell.java:661)    at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:639) at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:435) at
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:277) at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:125) at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:344) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)   at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)   at
java.security.AccessController.doPrivileged(Native Method)  at
javax.security.auth.Subject.doAs(Subject.java:422)  at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at
org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)   at
org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)    at
misc.projects.hadoop.exercises.WordCountDriverApp.main(WordCountDriverApp.java:29)

The failing line (WordCountDriverApp.java:29) contains the command to launch the job:

job.waitForCompletion(true)

I want to make it work and therefore I want to understand something:

Do I have to provide any hdfs-site.xml, yarn-site.xml, ... all this, if I want just the local mode (without any cluster)? I don't have these XML config files now. As far as I remember, the defaults are all OK for the local mode, maybe I am wrong.

Is it possible at all under Windows (to launch any Hadoop jobs whatsoever) or the whole Hadoop thing is Linux-only?

P.S.: The Hadoop dependency is the following:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.2.0</version>
    <scope>provided</scope>
</dependency>

标签: maven hadoop
2条回答
混吃等死
2楼-- · 2019-01-20 02:22
  1. Download Hadoop 2.6.0 or 2.7.1 compiled for Windows
  2. Create HADOOP_HOME environment variable pointing to the unzipped dir
  3. Add %HADOOP_HOME%\bin to PATH env var

Source: https://stackoverflow.com/a/27394808/543836

查看更多
唯我独甜
3楼-- · 2019-01-20 02:30

Hadoop runs on Windows, it is possible, but you'll grow white hair if you try to pull it off on your own.

To start with, all filesystem operations in Windows Hadoop are routed either through the NativeIO, if available, or via winutils if NativeIO is not loaded. In your case it took the winutils path. You could make NativeIO available if you instruct Eclipse where to find it. See How to add native library to “java.library.path” with Eclipse launch (instead of overriding it), you need to add the location of that hadoop-common-project project target's bin, where you'll find hadoop.dll which hosts the NativeIO. But even after that, you'll still need wintils for container launch. The winutils.exe will be in that same location (the hadoop-common target/bin), but the code looks for it based on %HADOOP_HOME%, so you'll have to define that. And it will go uphill from there. I intentionally omitted the details how to configure all these because I don't think you should, or to be more precise, you should only if you understand how to do it.

It would be much much easier if you take an off-the-shelf Hadoop distribution for Windows, of which there are exactly one: the HDP from Hortonworks, download it, install it, configure it and then run against the 'cluster'.

查看更多
登录 后发表回答