Reading/Writing files to HDFS from Windwos server

2019-09-10 23:15发布

问题:

I want to write files to HDFS from windows server. Hadoop cluster is on Linux. I tried researching everywhere I got a java code that can be run using "hadoop jar"

Can somebody help me to understand how can I run HDFS file write java code from windows? What is required on Windows box? Even a proper link will do.

回答1:

You need only to code a simple java program and run it like a normal .jar file.

In the project you need to import the hadoop library.

This is a working example maven project (I tested it on my cluster):

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;


public class WriteFileToHdfs {

    public static void main(String[] args) throws IOException, URISyntaxException {

        String dataNameLocation = "hdfs://[your-namenode-ip]:[the-port-where-hadoop-is-listening]/";

        Configuration configuration = new Configuration();
        FileSystem hdfs = FileSystem.get( new URI( dataNameLocation ), configuration );
        Path file = new Path(dataNameLocation+"/myFile.txt");

        FSDataOutputStream out = hdfs.create(file);
        out.writeUTF("Some text ...");
        out.close();

        hdfs.close();

    }

}

Remember to put the dependencies to your pom.xml and the instruction to build the manifest file for the main class:

<properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.7</maven.compiler.source>
        <maven.compiler.target>1.7</maven.compiler.target>
        <mainClass>your.cool.package.WriteFileToHdfs</mainClass>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.1</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>            
          <plugin>
            <artifactId>maven-dependency-plugin</artifactId>
            <executions>
                <execution>
                    <phase>install</phase>
                    <goals>
                        <goal>copy-dependencies</goal>
                    </goals>
                    <configuration>
                        <outputDirectory>${project.build.directory}/lib</outputDirectory>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <artifactId>maven-jar-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <addClasspath>true</addClasspath>
                        <classpathPrefix>lib/</classpathPrefix>
                        <mainClass>${mainClass}</mainClass>
                    </manifest>
                </archive>
            </configuration>
        </plugin>
      </plugins>
    </build>

Just lunch the program with the command:

java -jar nameOfTheJarFile.jar

Of course you need to edit the code with your package name and namenode ip address.