Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Embedding schema with binary encoding


Copy link to this message
-
Re: Embedding schema with binary encoding
Thanks Scott. Even I realized, default is binary encoding and not json.

On Thu, Jan 10, 2013 at 12:52 AM, Scott Carey <[EMAIL PROTECTED]> wrote:

> In an Avro file, it always writes the schema in JSON form in the header.
>  There may be an old JIRA ticket considering the possibility of writing the
> schema in a more compact form.    The data in the file is always encoded in
> Avro binary form, optionally with snappy or deflate(gzip) compression and
> with a variable block size.
>
> On 1/8/13 1:49 AM, "Pratyush Chandra" <[EMAIL PROTECTED]> wrote:
>
> Hi Scott,
>
> I am able to find example for json encoding with DataFileWriter which
> embedds schema, but unable to find DataFileWriter example for binary
> encoding with schema.
>
> Thanks
> Pratyush
>
> On Tue, Jan 8, 2013 at 2:56 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
>
>> Calling toJson() on a Schema will print it in json fom.  However you most
>> likely do not want to invent your own file format for Avro data.
>>
>> DataFileWriter which will manage the schema for you, along with
>> compression, metadata, and the ability to seek to the middle of the file.
>>  Additionally it is then readable by several other languages and tools.
>>
>> On 1/7/13 4:42 AM, "Pratyush Chandra" <[EMAIL PROTECTED]> wrote:
>>
>> I am able to serialize with binary encoding to a file using following :
>>         FileOutputStream outputStream = new FileOutputStream(file);
>>         Encoder e = EncoderFactory.get().binaryEncoder(outputStream,
>> null);
>>         DatumWriter<GenericRecord> datumWriter = new
>> GenericDatumWriter<GenericRecord>(schema);
>>         GenericRecord message1= new GenericData.Record(schema);
>>         message1.put("to", "Alyssa");
>>         datumWriter.write(message1, e);
>>         e.flush();
>>         outputStream.close();
>>
>> But the output file contains only serialized data and not schema. How can
>> I add schema also ?
>>
>> Thanks
>> Pratyush Chandra
>>
>>
>
>
> --
> Pratyush Chandra
>
>
--
Pratyush Chandra
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB