Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro Container file and JsonEncoding.


Copy link to this message
-
Re: Avro Container file and JsonEncoding.
On 02/08/2012 07:14 AM, karthik ramachandran wrote:
> I'm trying to figure out if its possible to create an Avro container
> file with JsonEnconding.  It doesn't appear to be:
> org.apache.avro.file.DataFileWriter seems to use a binary encoder by
> default.

That's correct.  Avro's data file format always uses the binary encoding.

> Is there another FileWriter class that I should be using?

There's not a FileWriter class for this, but it only takes a few lines
of code to write JSON format to a file.  It's probably a good idea to
include the schema as the first line of such files, e.g.:

OutputStream out = new FileOutputStream(<file>);
try {
  out.write((<schema>+"\n").getBytes("UTF-8"));
  Encoder encoder = EncoderFactory.jsonEncoder(<schema>, out);
  DatumWriter writer = new Specific/GenericDatumWriter(<schema>);
  while (<more>) {
    writer.write(<next>, encoder);
  }
  encoder.flush();
} finally {
  out.close();
}

Perhaps we should add a Java FileWriter interface to Avro, like the
FileReader interface we already have, then implement JsonFileWriter and
JsonFileReader using the above format (schema on first line, line per
item).  If that's of interest, please file an issue in Jira.

Doug