Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> json2avro


Copy link to this message
-
Re: json2avro

On Tue, 25 Jun 2013, Douglas Creager wrote:

>> json2avro lets you pick from null, snappy, deflate and lzma codecs,
>> specify a custom block size and optionally skips over JSON lines that it
>> is unable to parse. I'm also thinking of adding a target max file size
>> so that it would automatically split output into multiple sizes.
>
> Very cool!  Kind of the reverse of avrocat or avropipe.  We could clean
> it up and add it as another C command-line tool if you like.

Sure I'm all for it!

>> It uses Jansson as the JSON parser which is conveniently bundled with
>> Avro-C. (One thing that I'm not clear on is that Jansson cannot handle
>> nulls, not sure if this is a Jansson-specific limitation or something
>> inherent to JSON.)
>
> Can you elaborate on this?  Jansson should support null JSON values
> (it's the keyword null, not the string value "null").  And the Avro C
> bindings should use that for Avro null values.

Sorry I wasn't clear. Jansson uses null-terminated strings. The docs state
"Normal null terminated C strings are used, so JSON strings may not
contain embedded null characters." I've tested it and indeed, they cannot,
Jansson cannot parse a string like "abc\u0000def".

Grisha
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB