Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: GroupingComparator


Copy link to this message
-
Re: GroupingComparator
Hi Dave,

thanks for your reply. Now it's more clear; in fact the code that I
wrote is inspired to the old api, where the behavior is another.
So, how can I achieve the same behavior as the old api? I need the
second field of the first key object to stay the same among the
iterations, in order to compare it with other objects. Do I have to
clone the object?
Thanks.

On 15 October 2012 21:27, Dave Beech <[EMAIL PROTECTED]> wrote:
> Hi Alberto
>
> The iterator you are looping over in your reduce method isn't a
> self-contained list of values. What's actually happening is that
> you're iterating through *part* of the sorted key/value set that was
> sent to that reduce node, and it is the grouping comparator that
> decides when to break that loop and call reduce again on the next key.
>
> Moreover, the "key" object is re-used. So, as you're iterating through
> the values, what's actually happening is this pointer to the
> associated key data moves with it - and you're seeing it change.
>
> This only happens in the new "mapreduce" API - in the older "mapred"
> API you get the first key, and it appears to stay the same during the
> loop.
>
> It's sometimes useful behaviour, but it's confusing how the two APIs
> don't act the same.
>
> Hope that helps,
> Dave
>
> On 15 October 2012 20:11, Alberto Cordioli <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> a very strange thing is happening with my hadoop program.
>> My map simply emits tuples with a custom object as key (which
>> implement WritableComparable).
>> The object is made of 2 fields, and I implement my partitioner and
>> groupingclass in such a way that only the first field is taken into
>> account.
>> The second field is just a tag and could be 1 or 2.
>>
>> This is the reducer's snippet:
>>
>> tag = key.getSecondField();
>> Iterator it1 = values.iterator();
>> while(it1.hasNext()){
>>         it1.next();
>>         collector.emit(new Text("dummy"), tag);
>> }
>>
>> I would expect in my output all the lines with:
>> dummy       1
>> ...
>> dummy       1
>>
>> but actually the value of tag changes in time and I obtain this type of output:
>>
>> dummy    1
>> ...
>> dummy    1
>> dummy    2
>> ...
>> dummy    2
>>
>>
>> Someone could explain me way, please?
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> --
>> Alberto Cordioli

--
Alberto Cordioli
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB