Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> String types in GenericRecord when using mapreduce vs mapred


Copy link to this message
-
String types in GenericRecord when using mapreduce vs mapred
String types in GenericRecord when using mapreduce vs mapred

Depending on the version of the hadoop api I am using, I am getting
generic avro objects that use either Utf8 or java.lang.String to
represent avro strings...

The existing hadoop job which is defined using the old api
(mapred). This job works with Avro file and generic records.

The objects are records. One of their field is "Key", and its value is
a string.

In my mapper, I print the class of the value of the "Key" field for
debugging purposes:

private static class DiffMapper extends AvroMapper<GenericRecord,
Pair<Utf8, GenericRecord>>
{
    @Override
    public void map(GenericRecord record, AvroCollector<Pair<Utf8,
GenericRecord>> collector, Reporter reporter)
       throws IOException
    {
       System.out.println(record.get("Key").getClass());
       ...
       rest of mapper code
       ...

This prints org.apache.avro.util.Utf8
After I ported my job to the new api (mapreduce, see code below), the
debug code
reports that the value is of type String.

private static class DiffMapper extends
Mapper<AvroKey<GenericData.Record>, NullWritable, Text,
AvroValue<GenericData.Record>>
{
    @Override
    public void map(AvroKey<GenericData.Record> key, NullWritable value,
Context context)
       throws IOException, InterruptedException
    {
       GenericData.Record record = key.datum();
       System.out.println(record.get("Key").getClass());
       ...
       rest of mapper code
       ...
    }
}

Is there a way to get the first behavior (String are UTF8) with the
mapreduce api? I am using 1.7.3 from maven central.

Thank you
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB