The source code for the "storm-hdfs connector" that can be used to write data into HDFS. The github url is : https://github.com/ptgoetz/storm-hdfs There is a particular topology: "HdfsFileTopology" used to write '|' delimited data into HDFS. link: https://github.com/ptgoetz/storm-hdfs/blob/master/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java
I have questions about the part of the code:
Yaml yaml = new Yaml();
InputStream in = new FileInputStream(args[1]);
Map<String, Object> yamlConf = (Map<String, Object>) yaml.load(in);
in.close();
config.put("hdfs.config", yamlConf);
HdfsBolt bolt = new HdfsBolt()
.withConfigKey("hdfs.config")
.withFsUrl(args[0])
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withSyncPolicy(syncPolicy)
.addRotationAction(new MoveFileAction().toDestination("/dest2/"));
What does this part of the code do, especially the YAML part?
I think the code is quite clear. In order for
HdfsBolt
to be able to write into HDFS, it needs information about the HDFS itself and that is what you do when your create that YAML file.And to run that topology, you provide the path of that YAML file as a command line argument.
The author of the library made a good description here: Storm-HDFS Usage.
If you read the source code, you will find the contents of the YAML file will be used to configure the HDFS. Properly it could be something like HDFS Defaults but I can't be sure.
Properly it is bette to ask the author of the library.