I got stuck for a few days because I want to create a custom map reduce program based on my query on hive, I found not many examples after googling and I'm still confused about the rule.
What is the rule to create my custom mapreduce program, how about the mapper and reducer class?
Can anyone provide any solution?
I want to develop this program in Java, but I'm still stuck ,and then when formatting output in collector, how do I format the result in mapper and reducer class?
Does anybody want to give me some example and explanation about this kind of stuff?
There are basically 2 ways to add custom mappers/reducers to hive queries.
transform
where stuff1, stuff2 are the fields in table1 and script is any executable which accepts the format i describe later. thing1, thing2 are the outputs from script
This is slightly more complicated but gives more control. There are 2 parts to this. In the first part the mapper script will receive data from
table
and map it to fields mp1 and mp2. these are then passed on toreduce_script
, this script will receive sorted output on the key, which we have specified inCLUSTER BY mp1
. mind you, more than one key will be handled by one reducer. The output of the reduce script will go to tablesomeothertable
Now all these scripts follow a simple pattern. they will read line by line from stdin. The fields will be
\t
separated and they will write back to stdout, in the same manner ( fields separated by '\t' )Check out this blog, there are some nice examples.
http://dev.bizo.com/2009/07/custom-map-scripts-and-hive.html
http://dev.bizo.com/2009/10/reduce-scripts-in-hive.html