In many MapReduce programs, I see a reducer being used as a combiner as well. I know this is because of the specific nature of those programs. But I am wondering if they can be different.
相关问题
- Unable to generate jar file for Hadoop
- how to calculate count and unique count over two f
- Spark - Group by Key then Count by Value
- UserGroupInformation: No groups available for user
- Mapping through two data sets with Hadoop
相关文章
- mapreduce count example
- Exception in thread “main” java.lang.NoClassDefFou
- Compute first order derivative with MongoDB aggreg
- Why does TypeScript infer the 'never' type
- Difference between partial sort, total sort and se
- How to access the local data of a Cassandra node
- MapReduce WordCount Program - output is same as th
- Where Mapper output in Hadoop is saved?
Yeah they surely can be different, but I don't think you want to use a different class as mostly you will get unexpected result.
Combiners can only be used on the functions that are commutative(a.b = b.a) and associative {a.(b.c) = (a.b).c} . This also means that combiners may operate only on a subset of your keys and values or may not execute at all, still you want the output of the program to remain same.
Choosing a different class with different logic may not give you a logical output.
Here is the implementation , you can run without combiner and with combiner , both gives exactly same answer . Here Reducer and Combiner has different motive and different implementation.
Reduce Class
MapObject Class
Combiner Class
Driver Class
Git the code from here
Leave ur suggestions..
The primary goal of combiners is to optimize/minimize the number of key value pairs that will be shuffled across the network between mappers and reducers and thus to save as most bandwidth as possible.
The thumb rule of combiner is it has to have the same input and output variable types, the reason for this, is combiner use is not guaranteed, it can or can not be used , depending the volume and number of spills.
The reducer can be used as a combiner when it satisfies this rule i.e. same input and output variable type.
The other most important rule for combiner is it can only be used when the function you want to apply is both commutative and associative. like adding numbers .But not in case like average(if u r using same code as reducer).
Now to answer your question, yes off course they can be different, and when your reducer has different type of input , and output variables, u have no choice , but to make a different copy of ur reducer code and modifying it.
If u r concerned about the logic of the reducer , that you can implement in a different way as well, say in case of a combiner you can have a collection object to have a local buffer of all the values coming to the combiner, this is less risky than using it in reducer, because in case of reducer , it is more prone to go out of memory than in combiner. other logic differences can certainly exist and does.
Yes, a combiner can be different to the Reducer, although your Combiner will still be implementing the Reducer interface. Combiners can only be used in specific cases which are going to be job dependent. The Combiner will operate like a Reducer, but only on the subset of the Key/Values output from each Mapper.
One constraint that your Combiner will have, unlike a Reducer, is that the input/output key and value types must match the output types of your Mapper.