|
|
-
Avro Container file and JsonEncoding.
karthik ramachandran 2012-02-08, 15:14
Hi,
I'm trying to figure out if its possible to create an Avro container file with JsonEnconding. It doesn't appear to be: org.apache.avro.file.DataFileWriter seems to use a binary encoder by default.
Is there another FileWriter class that I should be using? Karthik
-- Karthik Ramachandran
-
Re: Avro Container file and JsonEncoding.
Doug Cutting 2012-02-08, 17:41
On 02/08/2012 07:14 AM, karthik ramachandran wrote: > I'm trying to figure out if its possible to create an Avro container > file with JsonEnconding. It doesn't appear to be: > org.apache.avro.file.DataFileWriter seems to use a binary encoder by > default.
That's correct. Avro's data file format always uses the binary encoding.
> Is there another FileWriter class that I should be using?
There's not a FileWriter class for this, but it only takes a few lines of code to write JSON format to a file. It's probably a good idea to include the schema as the first line of such files, e.g.:
OutputStream out = new FileOutputStream(<file>); try { out.write((<schema>+"\n").getBytes("UTF-8")); Encoder encoder = EncoderFactory.jsonEncoder(<schema>, out); DatumWriter writer = new Specific/GenericDatumWriter(<schema>); while (<more>) { writer.write(<next>, encoder); } encoder.flush(); } finally { out.close(); }
Perhaps we should add a Java FileWriter interface to Avro, like the FileReader interface we already have, then implement JsonFileWriter and JsonFileReader using the above format (schema on first line, line per item). If that's of interest, please file an issue in Jira.
Doug
-
Re: Avro Container file and JsonEncoding.
Scott Carey 2012-02-08, 17:57
On 2/8/12 7:14 AM, "karthik ramachandran" <[EMAIL PROTECTED]> wrote:
> Hi, > > I'm trying to figure out if its possible to create an Avro container file with > JsonEnconding. It doesn't appear to be: org.apache.avro.file.DataFileWriter > seems to use a binary encoder by default.
One thing to note is that if you write it to an Avro container file in binary it will be significantly smaller. You can extract the contents as JSON using either the C command line tools or the Java 'tojson' tool. If the reason you want it in JSON is for human readability, this is all you need.
For example, I often do the following:
java jar avro-tools.jar tojson my_avro_file.avro | grep .
or pipe it to other tools to view or interpret as JSON.
> > Is there another FileWriter class that I should be using?
See Doug's comments. It doesn't make sense to store JSON in an Avro Data File because it is delimited with binary markers and contains binary metadata. > > > Karthik > > -- > Karthik Ramachandran >
-
Re: Avro Container file and JsonEncoding.
karthik ramachandran 2012-02-08, 18:06
I'm in the process of writing/ debugging a MapReduce job and the Avro MapRed API seems to require that the input file be a proper Avro container file. I was hoping to be able to use the AvroMapper interface, feeding it a JSON file just as a debugging step. That way I can use VI to modify values in the JSON structure. However, if the Avro file format has binary delimiters, then this is probably not a viable approach.
Thanks, Karthik On Wed, Feb 8, 2012 at 12:57 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
> > > On 2/8/12 7:14 AM, "karthik ramachandran" <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm trying to figure out if its possible to create an Avro container file > with JsonEnconding. It doesn't appear to be: > org.apache.avro.file.DataFileWriter seems to use a binary encoder by > default. > > > One thing to note is that if you write it to an Avro container file in > binary it will be significantly smaller. You can extract the contents as > JSON using either the C command line tools or the Java 'tojson' tool. If > the reason you want it in JSON is for human readability, this is all you > need. > > For example, I often do the following: > > java –jar avro-tools.jar tojson my_avro_file.avro | grep …. > > or pipe it to other tools to view or interpret as JSON. > > > Is there another FileWriter class that I should be using? > > > See Doug's comments. It doesn't make sense to store JSON in an Avro Data > File because it is delimited with binary markers and contains binary > metadata. > > > > Karthik > > -- > Karthik Ramachandran > > -- Karthik Ramachandran Mobile: 412-606-8981
-
Re: Avro Container file and JsonEncoding.
Scott Carey 2012-02-08, 18:55
AvroJob by default uses AvroInputFormat, which uses Avro Data Files. You can write your own InputFormat that returns Avro objects if you wish, but you will be overriding more and more of the Avro mapreduce implementation.
If you have use cases that are in need of easier configuration, debugging, or greater flexibility please capture the request and use cases in a JIRA ticket. It will be useful for others who choose to volunteer time to enhance that part of Avro.
Thanks!
On 2/8/12 10:06 AM, "karthik ramachandran" <[EMAIL PROTECTED]> wrote:
> I'm in the process of writing/ debugging a MapReduce job and the Avro MapRed > API seems to require that the input file be a proper Avro container file. > > > I was hoping to be able to use the AvroMapper interface, feeding it a JSON > file just as a debugging step. That way I can use VI to modify values in the > JSON structure. However, if the Avro file format has binary delimiters, then > this is probably not a viable approach. > > Thanks, > Karthik > > > On Wed, Feb 8, 2012 at 12:57 PM, Scott Carey <[EMAIL PROTECTED]> wrote: >> >> >> On 2/8/12 7:14 AM, "karthik ramachandran" <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I'm trying to figure out if its possible to create an Avro container file >>> with JsonEnconding. It doesn't appear to be: >>> org.apache.avro.file.DataFileWriter seems to use a binary encoder by >>> default. >> >> One thing to note is that if you write it to an Avro container file in binary >> it will be significantly smaller. You can extract the contents as JSON using >> either the C command line tools or the Java 'tojson' tool. If the reason you >> want it in JSON is for human readability, this is all you need. >> >> For example, I often do the following: >> >> java jar avro-tools.jar tojson my_avro_file.avro | grep . >> >> or pipe it to other tools to view or interpret as JSON. >> >>> >>> Is there another FileWriter class that I should be using? >> >> See Doug's comments. It doesn't make sense to store JSON in an Avro Data >> File because it is delimited with binary markers and contains binary >> metadata. >>> >>> >>> Karthik >>> >>> -- >>> Karthik Ramachandran >>> > > > > -- > Karthik Ramachandran > Mobile: 412-606-8981
|
|