Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Map output records/reducer input records mismatch


+
Vyacheslav Zholudev 2011-08-16, 15:39
+
Scott Carey 2011-08-16, 20:22
+
Vyacheslav Zholudev 2011-08-16, 22:56
+
Scott Carey 2011-08-17, 01:56
+
Vyacheslav Zholudev 2011-08-17, 08:32
+
Scott Carey 2011-08-17, 17:06
Copy link to this message
-
Re: Map output records/reducer input records mismatch
Vyacheslav Zholudev 2011-08-17, 12:02
btw,

I was thinking to try it with Utf8 objects instead of strings and I wanted to reuse the same Utf8 object instead of creating new from String upon each map() call.
Why does not the Utf8 class have a method for setting bytes via a String object?

I created the following code snippet:

    public static Utf8 reuseUtf8Object(Utf8 container, String strToReuse) {
        byte[] strBytes = Utf8.getBytesFor(strToReuse);
        container.setByteLength(strBytes.length);
        System.arraycopy(strBytes, 0, container.getBytes(), 0, strBytes.length);
        return container;
    }

Would that be useful if this code is encapsulated into the Utf8 class?

Best,
Vyacheslav

On Aug 17, 2011, at 3:56 AM, Scott Carey wrote:

> On 8/16/11 3:56 PM, "Vyacheslav Zholudev" <[EMAIL PROTECTED]>
> wrote:
>
>> Hi, Scott,
>>
>> thanks for your reply.
>>
>>> What Avro version is this happening with? What JVM version?
>>
>> We are using Avro 1.5.1 and Sun JDK 6, but the exact version I will have
>> to look up.
>>
>>>
>>> On a hunch, have you tried adding -XX:-UseLoopPredicate to the JVM args
>>> if
>>> it is Sun and JRE 6u21 or later? (some issues in loop predicates affect
>>> Java 6 too, just not as many as the recent news on Java7).
>>>
>>> Otherwise, it may likely be the same thing as AVRO-782.  Any extra
>>> information related to that issue would be welcome.
>>
>> I will have to collect it. In the meanwhile, do you have any reasonable
>> explanations of the issue besides it being something like AVRO-782?
>
> What is your key type (map output schema, first type argument of Pair)?
> Is your key a Utf8 or String?  I don't have a reasonable explanation at
> this point, I haven't looked into it in depth with a good reproducible
> case.  I have my suspicions with how recycling of the key works since Utf8
> is mutable and its backing byte[] can end up shared.
>
>
>
>>
>> Thanks a lot,
>> Vyacheslav
>>
>>>
>>> Thanks!
>>>
>>> -Scott
>>>
>>>
>>>
>>> On 8/16/11 8:39 AM, "Vyacheslav Zholudev"
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm having multiple hadoop jobs that use the avro mapred API.
>>>> Only in one of the jobs I have a visible mismatch between a number of
>>>> map
>>>> output records and reducer input records.
>>>>
>>>> Does anybody encountered such a behavior? Can anybody think of possible
>>>> explanations of this phenomenon?
>>>>
>>>> Any pointers/thoughts are highly appreciated!
>>>>
>>>> Best,
>>>> Vyacheslav
>>>
>>>
>>
>> Best,
>> Vyacheslav
>>
>>
>>
>
>

+
Scott Carey 2011-08-17, 17:18
+
Vyacheslav Zholudev 2011-08-17, 18:09
+
Vyacheslav Zholudev 2011-08-17, 22:02
+
Vyacheslav Zholudev 2011-08-17, 22:59
+
Scott Carey 2011-08-17, 23:47
+
Vyacheslav Zholudev 2011-08-18, 12:50
+
Vyacheslav Zholudev 2011-08-17, 15:49