-Mapreduce Strings from reader, when Avro is clearly Utf8
Anna Lahoud 2013-08-27, 20:32
I am experiencing a problem and I found that another user wrote in about
this same issue in March 2013 but there were no replies to his question. I
am really hoping that there is someone who can explain this or offer
suggestions. I cut and paste his message in since I could only find it in
I have Avro files that clearly contain Utf8 and if I run non-mapreduce, I
get Utf8 out. However, with the same files, I get String objects back from
the mapper. Help!?!?!
Message-ID: <[EMAIL PROTECTED]>
Date: Fri, 08 Mar 2013 10:31:38 -0800
From: Pierre Mariani <[EMAIL PROTECTED]>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130221
To: [EMAIL PROTECTED]
Subject: String types in GenericRecord when using mapreduce vs mapred
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
X-Virus-Checked: Checked by ClamAV on apache.org
String types in GenericRecord when using mapreduce vs mapred
Depending on the version of the hadoop api I am using, I am getting
generic avro objects that use either Utf8 or java.lang.String to
represent avro strings...
The existing hadoop job which is defined using the old api
(mapred). This job works with Avro file and generic records.
The objects are records. One of their field is "Key", and its value is
In my mapper, I print the class of the value of the "Key" field for
private static class DiffMapper extends AvroMapper<GenericRecord,
public void map(GenericRecord record, AvroCollector<Pair<Utf8,
GenericRecord>> collector, Reporter reporter)
rest of mapper code
This prints org.apache.avro.util.Utf8
After I ported my job to the new api (mapreduce, see code below), the
reports that the value is of type String.
private static class DiffMapper extends
Mapper<AvroKey<GenericData.Record>, NullWritable, Text,
public void map(AvroKey<GenericData.Record> key, NullWritable value,
throws IOException, InterruptedException
GenericData.Record record = key.datum();
rest of mapper code
Is there a way to get the first behavior (String are UTF8) with the
mapreduce api? I am using 1.7.3 from maven central.