Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Re: Mapreduce Strings from reader, when Avro is clearly Utf8


Copy link to this message
-
Re: Mapreduce Strings from reader, when Avro is clearly Utf8
Anna Lahoud <[EMAIL PROTECTED]> writes:

> I am experiencing a problem and I found that another user wrote in
> about this same issue in March 2013 but there were no replies to his
> question. I am really hoping that there is someone who can explain
> this or offer suggestions. I cut and paste his message in since I
> could only find it in an archive.
>
> I have Avro files that clearly contain Utf8 and if I run
> non-mapreduce, I get Utf8 out. However, with the same files, I get
> String objects back from the mapper. Help!?!?!

There are some confusing differences between the now-named “data models”
used by the `mapred` vs `mapreduce` APIs.  

The Generic{Data,Datum{Reader,Writer}} and Specific implementations
generate `Utf8` instances by default.  The Reflect implementation
generates `String` instances only(?).

In 1.7.4 and earlier: The `mapred` API defaults to using the Specific
implementations (producing `Utf8`s), but may be configured to use the
Reflect implementations via the `...mapred.AvroJob.setReflect()` method.
The `mapreduce` API uses the Reflect implementations and cannot be
configured – and thus always produces `String` instances.  So no dice.

In 1.7.5 (and I hope later): Both the APIs allow you to specify the data
model as a sub-class of `GenericData`.  For example:

    import org.apache.avro.mapreduce.AvroJob;
    ....
    AvroJob.setDataModelClass(job, GenericData.class);

So-setting the job data model should yield the `Utf8` instances you’re
hoping for.

HTH,

-Marshall
+
Anna Lahoud 2013-08-27, 20:32
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB