Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Issue with a union with bytes and DataFileReader?


+
Jonathan Coveney 2013-04-25, 13:26
+
Jonathan Coveney 2013-05-06, 20:21
+
Doug Cutting 2013-05-06, 20:47
Copy link to this message
-
Re: Issue with a union with bytes and DataFileReader?
Thanks Doug. Makes perfect sense... I just hadn't found the ticket.

Internally, we just made a new InputFormat that uses JsonEncoder.
2013/5/6 Doug Cutting <[EMAIL PROTECTED]>

> This has been previously reported as:
>
> https://issues.apache.org/jira/browse/AVRO-1275
>
> Please also note that GenericData#toString() does not always produce
> output that JsonDecoder can read.  If you're using JsonDecoder then
> you should also use JsonEncoder.  That said, some folks don't like the
> way that those classes encode unions and prefer the JSON that
> GenericData#toString() generates.
>
> A union between, e.g., a string an an enum can produce ambiguous json.
>  To resolve this, JsonEncoder/Decoder tags union values (except unions
> with null) with the intended type.  A union between string and an enum
> named Flavor with values SWEET and SOUR might be rendered by
> JsonEncoder as {"string":"SOUR"} or {"Flavor":"SOUR"}, while
> GenericData#toString() would print "SOUR" in both cases.
>
> The wrapping of all "bytes" values in {"bytes": ...} by
> GenericData#toString() is separate and should probably be considered a
> bug.  Unfortunately fixing it would be an incompatible change, so
> should probably wait until release 1.8.
>
> Doug
>
> On Thu, Apr 25, 2013 at 6:26 AM, Jonathan Coveney <[EMAIL PROTECTED]>
> wrote:
> > This should replicate the issue on 1.7.4:
> > https://gist.github.com/jcoveney/5459644
> >
> > Basically, when using DataFileReader to read a union of bytes, it's
> > outputting in the form of {"bytes": "<thebytes>"}, which it doesn't do
> for
> > any other union types.
> >
> > Is this expected? Is this a bug?
> >
> > I appreciate your help,
> > Jon
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB