Using the storm hdfs connector to write data into

2019-08-20 16:11发布

问题:

The source code for the "storm-hdfs connector" that can be used to write data into HDFS. The github url is : https://github.com/ptgoetz/storm-hdfs There is a particular topology: "HdfsFileTopology" used to write '|' delimited data into HDFS. link: https://github.com/ptgoetz/storm-hdfs/blob/master/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java

I have questions about the part of the code:

Yaml yaml = new Yaml();
        InputStream in = new FileInputStream(args[1]);
        Map<String, Object> yamlConf = (Map<String, Object>) yaml.load(in);
        in.close();
        config.put("hdfs.config", yamlConf);

        HdfsBolt bolt = new HdfsBolt()
                .withConfigKey("hdfs.config")
                .withFsUrl(args[0])
                .withFileNameFormat(fileNameFormat)
                .withRecordFormat(format)
                .withRotationPolicy(rotationPolicy)
                .withSyncPolicy(syncPolicy)
                .addRotationAction(new MoveFileAction().toDestination("/dest2/"));

What does this part of the code do, especially the YAML part?

回答1:

I think the code is quite clear. In order for HdfsBolt to be able to write into HDFS, it needs information about the HDFS itself and that is what you do when your create that YAML file.

And to run that topology, you provide the path of that YAML file as a command line argument.

Usage: HdfsFileTopology [topology name] [yaml config file]

The author of the library made a good description here: Storm-HDFS Usage.

If you read the source code, you will find the contents of the YAML file will be used to configure the HDFS. Properly it could be something like HDFS Defaults but I can't be sure.

Properly it is bette to ask the author of the library.