Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Re: Mapreduce Strings from reader, when Avro is clearly Utf8


Copy link to this message
-
Re: Mapreduce Strings from reader, when Avro is clearly Utf8
Anna Lahoud <[EMAIL PROTECTED]> writes:

> I am experiencing a problem and I found that another user wrote in
> about this same issue in March 2013 but there were no replies to his
> question. I am really hoping that there is someone who can explain
> this or offer suggestions. I cut and paste his message in since I
> could only find it in an archive.
>
> I have Avro files that clearly contain Utf8 and if I run
> non-mapreduce, I get Utf8 out. However, with the same files, I get
> String objects back from the mapper. Help!?!?!

There are some confusing differences between the now-named “data models”
used by the `mapred` vs `mapreduce` APIs.  

The Generic{Data,Datum{Reader,Writer}} and Specific implementations
generate `Utf8` instances by default.  The Reflect implementation
generates `String` instances only(?).

In 1.7.4 and earlier: The `mapred` API defaults to using the Specific
implementations (producing `Utf8`s), but may be configured to use the
Reflect implementations via the `...mapred.AvroJob.setReflect()` method.
The `mapreduce` API uses the Reflect implementations and cannot be
configured – and thus always produces `String` instances.  So no dice.

In 1.7.5 (and I hope later): Both the APIs allow you to specify the data
model as a sub-class of `GenericData`.  For example:

    import org.apache.avro.mapreduce.AvroJob;
    ....
    AvroJob.setDataModelClass(job, GenericData.class);

So-setting the job data model should yield the `Utf8` instances you’re
hoping for.

HTH,

-Marshall
+
Anna Lahoud 2013-08-27, 20:32