How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter
for all the possible keys, but that is a horrible hack, which will fire up many jobs.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
There is TemplatedTsv in Scalding (from version 0.9.0rc16 and up), exactly same as Cascading TemplateTsv.
Tsv(args("input"), ('COUNTRY, 'GDP))
.read
.write(TemplatedTsv(args("output"), "%s", 'COUNTRY))
// it will create a directory for each country under "output" path in Hadoop mode.
回答2:
Use MultipleOutputFormat and extrapolate from these other SO questions to write a custom output class using the output format: Create Scalding Source like TextLine that combines multiple files into single mappers, Compress Output Scalding / Cascading TsvCompressed
回答3:
This suggestion on the Cascading User group suggests to use Cascading TemplateTap. Not sure how to connect this to Scalding though.