Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: GroupingComparator


Copy link to this message
-
Re: GroupingComparator
Alberto Cordioli 2012-10-15, 20:39
Hi Dave,

thanks for your reply. Now it's more clear; in fact the code that I
wrote is inspired to the old api, where the behavior is another.
So, how can I achieve the same behavior as the old api? I need the
second field of the first key object to stay the same among the
iterations, in order to compare it with other objects. Do I have to
clone the object?
Thanks.

On 15 October 2012 21:27, Dave Beech <[EMAIL PROTECTED]> wrote:
> Hi Alberto
>
> The iterator you are looping over in your reduce method isn't a
> self-contained list of values. What's actually happening is that
> you're iterating through *part* of the sorted key/value set that was
> sent to that reduce node, and it is the grouping comparator that
> decides when to break that loop and call reduce again on the next key.
>
> Moreover, the "key" object is re-used. So, as you're iterating through
> the values, what's actually happening is this pointer to the
> associated key data moves with it - and you're seeing it change.
>
> This only happens in the new "mapreduce" API - in the older "mapred"
> API you get the first key, and it appears to stay the same during the
> loop.
>
> It's sometimes useful behaviour, but it's confusing how the two APIs
> don't act the same.
>
> Hope that helps,
> Dave
>
> On 15 October 2012 20:11, Alberto Cordioli <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> a very strange thing is happening with my hadoop program.
>> My map simply emits tuples with a custom object as key (which
>> implement WritableComparable).
>> The object is made of 2 fields, and I implement my partitioner and
>> groupingclass in such a way that only the first field is taken into
>> account.
>> The second field is just a tag and could be 1 or 2.
>>
>> This is the reducer's snippet:
>>
>> tag = key.getSecondField();
>> Iterator it1 = values.iterator();
>> while(it1.hasNext()){
>>         it1.next();
>>         collector.emit(new Text("dummy"), tag);
>> }
>>
>> I would expect in my output all the lines with:
>> dummy       1
>> ...
>> dummy       1
>>
>> but actually the value of tag changes in time and I obtain this type of output:
>>
>> dummy    1
>> ...
>> dummy    1
>> dummy    2
>> ...
>> dummy    2
>>
>>
>> Someone could explain me way, please?
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> --
>> Alberto Cordioli

--
Alberto Cordioli