Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Generic data extraction from an Avro file


+
Public Network Services 2013-02-05, 19:53
Copy link to this message
-
Re: Generic data extraction from an Avro file
Yes, GenericData.Record#toString() should generate valid Json.  It
does lose some information, e.g.:
 - record names; and
 - the distinction between strings & enum symbols, ints & longs,
floats & doubles, and maps & records.

JsonEncoder loses less information.  It saves enough information to,
with the schema, always reconstitute an equivalent object.

Doug
On Tue, Feb 5, 2013 at 11:53 AM, Public Network Services
<[EMAIL PROTECTED]> wrote:
> Folks,
>
> Assuming an application that only needs to quickly examine the contents of a
> bunch of Avro data files (irrespective of binary or JSON encoding and
> without any prior schema or object structure knowledge), an approach could
> be to just extract the Avro records as text JSON records. To this effect, a
> simple approach could be:
>
> Create a DataFileStream<GenericRecord>(FileInputStream,
> GenericDatumReader<GenericRecord>) from a FileInputStream to the file. (If
> the file is not an Avro data file, an IOException is caused.)
> Read GenericRecord records from the DataFileStream object, while its
> hasNext() method returns true.
> Convert each GenericRecord object read into a JSON string, via calling its
> toString() method.
>
> For the test datasets in the Avro 1.7.3 distribution, this actually works
> fine.
>
> My question is, does anyone see any potential problems for (binary or JSON
> encoded) Avro data files, given the above logic? For example, should the
> GenericRecord.toString() method always produce a valid JSON string?
>
> Thanks!
>
+
Public Network Services 2013-02-05, 23:30
+
Doug Cutting 2013-02-06, 00:00
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB