Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - json2avro


Copy link to this message
-
Re: json2avro
Douglas Creager 2013-06-25, 13:08
> I wrote a little C tool using Avro-C to convert JSON to Avro and thought
> may be someone here may find it useful.
>
> https://github.com/grisha/json2avro
>
> The purpose is to be useful in converting messy legacy JSON in which
> some elements might be missing or of wrong type. Even though there is no
> schema resolution per se here, json2avro will attempt to use the default
> specified in the schema if the corresponding JSON element is missing and
> will attempt to try the types specified in a union until one succeeds.
>
> json2avro lets you pick from null, snappy, deflate and lzma codecs,
> specify a custom block size and optionally skips over JSON lines that it
> is unable to parse. I'm also thinking of adding a target max file size
> so that it would automatically split output into multiple sizes.

Very cool!  Kind of the reverse of avrocat or avropipe.  We could clean
it up and add it as another C command-line tool if you like.

> It uses Jansson as the JSON parser which is conveniently bundled with
> Avro-C. (One thing that I'm not clear on is that Jansson cannot handle
> nulls, not sure if this is a Jansson-specific limitation or something
> inherent to JSON.)

Can you elaborate on this?  Jansson should support null JSON values
(it's the keyword null, not the string value "null").  And the Avro C
bindings should use that for Avro null values.

cheers
–doug