I have some counters I created at my Mapper class:
(example written using the appengine-mapreduce Java library v.0.5)
@Override
public void map(Entity entity) {
getContext().incrementCounter("analyzed");
if (isSpecial(entity)){
getContext().incrementCounter("special");
}
}
(The method isSpecial
just returns true
or false
depending on the state of the entity, not relevant to the question)
I want to access those counters when I finish processing the whole stuff, at the finish
method of the Output class:
@Override
public Summary finish(Collection<? extends OutputWriter<Entity>> writers) {
//get the counters and save/return the summary
int analyzed = 0; //getCounter("analyzed");
int special = 0; //getCounter("special");
Summary summary = new Summary(analyzed, special);
save(summary);
return summary;
}
... but the method getCounter
is only available from the MapperContext class, which is accessible only from Mappers/Reducers getContext()
method.
How can I access my counters at the Output stage?
Side note: I can't send the counters values to my outputted class because the whole Map/Reduce is about transforming a set of Entities to another set (in other words: the counters are not the main purpose of the Map/Reduce). The counters are just for control - it makes sense I compute them here instead of creating another process just to make the counts.
Thanks.
There is not a way to do this inside of output today. But feel free to request it here: https://code.google.com/p/appengine-mapreduce/issues/list
What you can do however is to chain a job to run after your map-reduce that will receive it's output and counters. There is an example of this here: https://code.google.com/p/appengine-mapreduce/source/browse/trunk/java/example/src/com/google/appengine/demos/mapreduce/entitycount/ChainedMapReduceJob.java
In the above example it is running 3 MapReduce jobs in a row. Note that these don't have to be MapReduce jobs, you can create your own class that extends Job and has a run method which creates your Summary object.