Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Generic data extraction from an Avro file


Copy link to this message
-
Re: Generic data extraction from an Avro file
Thanks for the clarification.

Is there any way to use JsonEncoder in the scenario I mentioned, i.e. in
totally schema-agnostic data extraction from either binary or JSON files?
On Tue, Feb 5, 2013 at 2:58 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> Yes, GenericData.Record#toString() should generate valid Json.  It
> does lose some information, e.g.:
>  - record names; and
>  - the distinction between strings & enum symbols, ints & longs,
> floats & doubles, and maps & records.
>
> JsonEncoder loses less information.  It saves enough information to,
> with the schema, always reconstitute an equivalent object.
>
> Doug
>
>
> On Tue, Feb 5, 2013 at 11:53 AM, Public Network Services
> <[EMAIL PROTECTED]> wrote:
> > Folks,
> >
> > Assuming an application that only needs to quickly examine the contents
> of a
> > bunch of Avro data files (irrespective of binary or JSON encoding and
> > without any prior schema or object structure knowledge), an approach
> could
> > be to just extract the Avro records as text JSON records. To this
> effect, a
> > simple approach could be:
> >
> > Create a DataFileStream<GenericRecord>(FileInputStream,
> > GenericDatumReader<GenericRecord>) from a FileInputStream to the file.
> (If
> > the file is not an Avro data file, an IOException is caused.)
> > Read GenericRecord records from the DataFileStream object, while its
> > hasNext() method returns true.
> > Convert each GenericRecord object read into a JSON string, via calling
> its
> > toString() method.
> >
> > For the test datasets in the Avro 1.7.3 distribution, this actually works
> > fine.
> >
> > My question is, does anyone see any potential problems for (binary or
> JSON
> > encoded) Avro data files, given the above logic? For example, should the
> > GenericRecord.toString() method always produce a valid JSON string?
> >
> > Thanks!
> >
>