Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> json2avro


+
Gregory 2013-06-19, 21:34
> I wrote a little C tool using Avro-C to convert JSON to Avro and thought
> may be someone here may find it useful.
>
> https://github.com/grisha/json2avro
>
> The purpose is to be useful in converting messy legacy JSON in which
> some elements might be missing or of wrong type. Even though there is no
> schema resolution per se here, json2avro will attempt to use the default
> specified in the schema if the corresponding JSON element is missing and
> will attempt to try the types specified in a union until one succeeds.
>
> json2avro lets you pick from null, snappy, deflate and lzma codecs,
> specify a custom block size and optionally skips over JSON lines that it
> is unable to parse. I'm also thinking of adding a target max file size
> so that it would automatically split output into multiple sizes.

Very cool!  Kind of the reverse of avrocat or avropipe.  We could clean
it up and add it as another C command-line tool if you like.

> It uses Jansson as the JSON parser which is conveniently bundled with
> Avro-C. (One thing that I'm not clear on is that Jansson cannot handle
> nulls, not sure if this is a Jansson-specific limitation or something
> inherent to JSON.)

Can you elaborate on this?  Jansson should support null JSON values
(it's the keyword null, not the string value "null").  And the Avro C
bindings should use that for Avro null values.

cheers
–doug

+
Gregory 2013-06-25, 14:48
+
Douglas Creager 2013-06-25, 15:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB