Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro, mail # user - Embedding schema with binary encoding


+
Pratyush Chandra 2013-01-07, 12:42
+
Scott Carey 2013-01-08, 09:26
+
Pratyush Chandra 2013-01-08, 09:49
+
Scott Carey 2013-01-09, 19:22
Copy link to this message
-
Re: Embedding schema with binary encoding
Pratyush Chandra 2013-01-09, 19:47
Thanks Scott. Even I realized, default is binary encoding and not json.

On Thu, Jan 10, 2013 at 12:52 AM, Scott Carey <[EMAIL PROTECTED]> wrote:

> In an Avro file, it always writes the schema in JSON form in the header.
>  There may be an old JIRA ticket considering the possibility of writing the
> schema in a more compact form.    The data in the file is always encoded in
> Avro binary form, optionally with snappy or deflate(gzip) compression and
> with a variable block size.
>
> On 1/8/13 1:49 AM, "Pratyush Chandra" <[EMAIL PROTECTED]> wrote:
>
> Hi Scott,
>
> I am able to find example for json encoding with DataFileWriter which
> embedds schema, but unable to find DataFileWriter example for binary
> encoding with schema.
>
> Thanks
> Pratyush
>
> On Tue, Jan 8, 2013 at 2:56 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
>
>> Calling toJson() on a Schema will print it in json fom.  However you most
>> likely do not want to invent your own file format for Avro data.
>>
>> DataFileWriter which will manage the schema for you, along with
>> compression, metadata, and the ability to seek to the middle of the file.
>>  Additionally it is then readable by several other languages and tools.
>>
>> On 1/7/13 4:42 AM, "Pratyush Chandra" <[EMAIL PROTECTED]> wrote:
>>
>> I am able to serialize with binary encoding to a file using following :
>>         FileOutputStream outputStream = new FileOutputStream(file);
>>         Encoder e = EncoderFactory.get().binaryEncoder(outputStream,
>> null);
>>         DatumWriter<GenericRecord> datumWriter = new
>> GenericDatumWriter<GenericRecord>(schema);
>>         GenericRecord message1= new GenericData.Record(schema);
>>         message1.put("to", "Alyssa");
>>         datumWriter.write(message1, e);
>>         e.flush();
>>         outputStream.close();
>>
>> But the output file contains only serialized data and not schema. How can
>> I add schema also ?
>>
>> Thanks
>> Pratyush Chandra
>>
>>
>
>
> --
> Pratyush Chandra
>
>
--
Pratyush Chandra