Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Issue with a union with bytes and DataFileReader?

Copy link to this message
Re: Issue with a union with bytes and DataFileReader?
Thanks Doug. Makes perfect sense... I just hadn't found the ticket.

Internally, we just made a new InputFormat that uses JsonEncoder.
2013/5/6 Doug Cutting <[EMAIL PROTECTED]>

> This has been previously reported as:
> https://issues.apache.org/jira/browse/AVRO-1275
> Please also note that GenericData#toString() does not always produce
> output that JsonDecoder can read.  If you're using JsonDecoder then
> you should also use JsonEncoder.  That said, some folks don't like the
> way that those classes encode unions and prefer the JSON that
> GenericData#toString() generates.
> A union between, e.g., a string an an enum can produce ambiguous json.
>  To resolve this, JsonEncoder/Decoder tags union values (except unions
> with null) with the intended type.  A union between string and an enum
> named Flavor with values SWEET and SOUR might be rendered by
> JsonEncoder as {"string":"SOUR"} or {"Flavor":"SOUR"}, while
> GenericData#toString() would print "SOUR" in both cases.
> The wrapping of all "bytes" values in {"bytes": ...} by
> GenericData#toString() is separate and should probably be considered a
> bug.  Unfortunately fixing it would be an incompatible change, so
> should probably wait until release 1.8.
> Doug
> On Thu, Apr 25, 2013 at 6:26 AM, Jonathan Coveney <[EMAIL PROTECTED]>
> wrote:
> > This should replicate the issue on 1.7.4:
> > https://gist.github.com/jcoveney/5459644
> >
> > Basically, when using DataFileReader to read a union of bytes, it's
> > outputting in the form of {"bytes": "<thebytes>"}, which it doesn't do
> for
> > any other union types.
> >
> > Is this expected? Is this a bug?
> >
> > I appreciate your help,
> > Jon