Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Decode without using DataFileReader


Copy link to this message
-
Re: Decode without using DataFileReader
The DataFile file-format stores the schema, as part of its header.
That's one of its advantages.

The encoder/decoder are lower levels, and do not do that. You need to
manage the schema yourself if you choose to use the encoder/decoder
instead of the datafile format (why?) - the source stream can't have
it if you do not store it - it makes no sense for the encoder to store
schema for every given record, into a stream.

On Mon, Dec 5, 2011 at 11:18 PM, Gaurav Nanda <[EMAIL PROTECTED]> wrote:
> I guess I did not put it right way.
>
> See this sample code:
> ---------------------------------------
> public static void testRead (File file) throws IOException {
>    GenericDatumReader<GenericData.Record> datum = new
> GenericDatumReader<GenericData.Record>();
>    DataFileReader<GenericData.Record> reader = new
> DataFileReader<GenericData.Record>(file, datum);
>
>    GenericData.Record record = new GenericData.Record(reader.getSchema());
>    while (reader.hasNext()) {
>      reader.next(record);
>      System.out.println("Name " + record.get("name") + " Age " +
> record.get("age"));
>    }
>
>    reader.close();
>  }
> -------------------------------
> This takes file as an input, which contains both schema and actual data.
> In my case, Instead of having a file, I have some other stream of
> schema & data which I am passing to DecodeData() function.
>
> So, the question now is, how do I extract schema from there?
>
> Thanks,
> Gaurav Nanda
>
> On Mon, Dec 5, 2011 at 9:40 PM, Matt Stevenson
> <[EMAIL PROTECTED]> wrote:
>> No, the schema needs to be present in some form to tell the reader how to
>> decode the data.
>> You can generate classes from the schema and pass in the class, but that is
>> just a different way of passing in the schema.
>>
>>
>> On Mon, Dec 5, 2011 at 9:33 AM, Gaurav <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to read byte stream of encoded data, which is coming from some
>>> source but File. So I should not use DataFileReader.
>>>
>>> I wrote following code to do that, but here I have to specify schema on my
>>> own, which ideally should come from data itself. Is there any other way to
>>> get decode data with explicitly specifying schema and without using
>>> DataFileReader?
>>> ----------------------
>>>        private static void DecodeData(byte[] buf) throws IOException {
>>>                // TODO Auto-generated method stub
>>>                Schema schema = createSchema();
>>>                GenericDatumReader<GenericData.Record> datum = new
>>> GenericDatumReader<GenericData.Record>(schema);
>>>
>>>                ByteArrayInputStream in = new ByteArrayInputStream(buf);
>>>                BinaryDecoder decoder = DECODER_FACTORY.binaryDecoder(in,
>>> null);
>>>
>>>                GenericData.Record record = new
>>> GenericData.Record(datum.getSchema());
>>>                datum.read(record, decoder);
>>>
>>>                System.out.println(record.get("trade"));
>>>        }
>>> ---------------------
>>>
>>> Thanks,
>>> Gaurav Nanda
>>>
>>> --
>>> View this message in context:
>>> http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3561722.html
>>> Sent from the Avro - Users mailing list archive at Nabble.com.
>>
>>
>>
>>
>> --
>> Matt Stevenson.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB