Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Should mapreduce.ReduceContext reuse same object in nextKeyValue?

Copy link to this message
Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?
Eric Sammer 2010-01-13, 00:14
On 1/12/10 6:53 PM, Wilkes, Chris wrote:
> I created my own Writable class to store 3 pieces of information.  In my
> mapreducer.Reducer class I collect all of them and then process as a
> group, ie:
> reduce(key, values, context) {
>   List<Foo> myFoos =new ArrayList();
>   for (Foo value : values) {
>    myFoos.add(value);
>   }
> }


> Am I doing something wrong?  Should I expect this VALUEIN object to
> change from underneath me?  I'm using hadoop 0.20.1 (from a cloudera
> tarball)

That's the documented behavior. Hadoop reuses the same Writable instance
and replaces the *members* in the readFields() method in most cases (all
cases?). The instance of Foo in your example will be the same object and
simply have its members overwritten after each call to readFields().
Currently, you're building a list of the same object. At the end of your
for, you'll have a list of N objects all containing the same data. This
is one of those "gotchas." If you really need to build a list like this,
you'd have to resort to doing a deep copy, but you're better off avoid
it if you can as it will drastically impact performance and add the
requirement that all values for a given key fit in memory.

Hope this helps.
Eric Sammer