Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: GroupingComparator


+
Alberto Cordioli 2012-10-15, 20:39
+
Dave Beech 2012-10-15, 20:49
Copy link to this message
-
Re: GroupingComparator
Thanks Dave.
You solved my problem. Just a little question about your tip:
I suppose also the value returned by iterator.next() is re-used.
So if want to store some values of the Iterable list in the reducer, I
should create a List and put cloned objects inside it.
In this case there is no possibility to avoid the "new" operator, right?

On 15 October 2012 22:49, Dave Beech <[EMAIL PROTECTED]> wrote:
> Well, if all you need is the tag (the 1 or 2), why not just use a Text
> or IntWritable instance variable. You wouldn't need to clone the whole
> key.
>
> Then, instead of tag = key.getSecondField() you'd say
> tag.set(key.getSecondField().get());
> I don't know what type of object tag is (if it's Text you'll say
> toString() rather than get()), but you see what I mean.
>
> Also - just a tip - try to avoid creating new objects wherever
> possible. You'll get better performance if you create one Text object
> as an instance variable and re-use it by setting the value instead of
> calling new Text("") on every output.
>
> Thanks,
> Dave
>
> On 15 October 2012 21:39, Alberto Cordioli <[EMAIL PROTECTED]> wrote:
>> Hi Dave,
>>
>> thanks for your reply. Now it's more clear; in fact the code that I
>> wrote is inspired to the old api, where the behavior is another.
>> So, how can I achieve the same behavior as the old api? I need the
>> second field of the first key object to stay the same among the
>> iterations, in order to compare it with other objects. Do I have to
>> clone the object?
>>
>>
>> Thanks.
>>
>> On 15 October 2012 21:27, Dave Beech <[EMAIL PROTECTED]> wrote:
>>> Hi Alberto
>>>
>>> The iterator you are looping over in your reduce method isn't a
>>> self-contained list of values. What's actually happening is that
>>> you're iterating through *part* of the sorted key/value set that was
>>> sent to that reduce node, and it is the grouping comparator that
>>> decides when to break that loop and call reduce again on the next key.
>>>
>>> Moreover, the "key" object is re-used. So, as you're iterating through
>>> the values, what's actually happening is this pointer to the
>>> associated key data moves with it - and you're seeing it change.
>>>
>>> This only happens in the new "mapreduce" API - in the older "mapred"
>>> API you get the first key, and it appears to stay the same during the
>>> loop.
>>>
>>> It's sometimes useful behaviour, but it's confusing how the two APIs
>>> don't act the same.
>>>
>>> Hope that helps,
>>> Dave
>>>
>>> On 15 October 2012 20:11, Alberto Cordioli <[EMAIL PROTECTED]> wrote:
>>>> Hi all,
>>>>
>>>> a very strange thing is happening with my hadoop program.
>>>> My map simply emits tuples with a custom object as key (which
>>>> implement WritableComparable).
>>>> The object is made of 2 fields, and I implement my partitioner and
>>>> groupingclass in such a way that only the first field is taken into
>>>> account.
>>>> The second field is just a tag and could be 1 or 2.
>>>>
>>>> This is the reducer's snippet:
>>>>
>>>> tag = key.getSecondField();
>>>> Iterator it1 = values.iterator();
>>>> while(it1.hasNext()){
>>>>         it1.next();
>>>>         collector.emit(new Text("dummy"), tag);
>>>> }
>>>>
>>>> I would expect in my output all the lines with:
>>>> dummy       1
>>>> ...
>>>> dummy       1
>>>>
>>>> but actually the value of tag changes in time and I obtain this type of output:
>>>>
>>>> dummy    1
>>>> ...
>>>> dummy    1
>>>> dummy    2
>>>> ...
>>>> dummy    2
>>>>
>>>>
>>>> Someone could explain me way, please?
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alberto Cordioli
>>
>>
>>
>> --
>> Alberto Cordioli

--
Alberto Cordioli
+
Dave Beech 2012-10-16, 09:08
+
Alberto Cordioli 2012-10-16, 09:45
+
Vinod Kumar Vavilapalli 2012-10-16, 18:44