-Re: combiner/reducer context in java class
Harsh J 2012-11-06, 11:52
I believe the framework does give a few combiner statistics of its own
(like in/out records and such). If your combiner class is separate,
then instantiating counters in it with apt naming should address the
need, since the class itself will be separately instantiated.
Even if we looked at the task ID, its currently hard to tell if its
within a combiner mode or not. I can only think of hacky ways like
polling from within if the combiner input records counter is changing
with each call (then its in combiner) or remains as-is (then its a
reducer). The separate class way is much more elegant here since you
do want a difference in behavior, and you have inheritance at your
disposal to prevent duplication.
On Tue, Nov 6, 2012 at 5:12 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> I agree that the behaviour shouldn't be dynamically changed at runtime with
> regard to the class being use as a Combiner or a Reducer but someone may
> want to produce counters in order to have an overview of what is happening
> (sanity check). But you really would like to be able to not aggregate the
> same counters between the Combiner and the Reducer. How would someone do
> that? ie you can introduce a combine/reduce keyword in the counters name but
> how would you detect which instantiation is used in which case? I guess
> somehow with the task name it might be possible.. Is there a better way?
> BUT if you look at the jobtracker counters summary there is a distinction
> between map and reduce values. Maybe it is enough in this case? (I have
> never used counters inside a combiner so I don't know.)
> On Tue, Nov 6, 2012 at 12:29 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Hi Prasad,
>> My reply inline.
>> On Tue, Nov 6, 2012 at 4:15 PM, Prasad GS <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> > I'm setting my combiner and reducer to the same java class. Is there any
>> > API
>> > that could tell me the context in which the java class is running after
>> > the
>> > hadoop job is submitted to the cluster i.e whether the class is running
>> > as a
>> > combiner or a reducer.
>> A combiner may run both at the map end and at the reduce end. Even if
>> it is possible to do it, it isn't a healthy idea to have the method's
>> logic detect if its running as a reducer or as a combiner.
>> > I need this information to change the OutputCollector
>> > in the java class. Also I do not want to duplicate the same code as
>> > combiner
>> > and reducer with only the OutputCollector changed.
>> Why do you think it would require duplication? Your logic can be built
>> in smaller, independent, reusable functions within the same class, and
>> just applied differently for an implementation of Reducer class and an
>> implementation of the Combiner class. This way, you repeat nothing.
>> > Thanks,
>> > Prasad
>> Harsh J
> Bertrand Dechoux