Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Decode without using DataFileReader


+
Gaurav 2011-12-05, 15:33
+
Matt Stevenson 2011-12-05, 16:10
+
Gaurav Nanda 2011-12-05, 17:48
Copy link to this message
-
Re: Decode without using DataFileReader
The DataFile file-format stores the schema, as part of its header.
That's one of its advantages.

The encoder/decoder are lower levels, and do not do that. You need to
manage the schema yourself if you choose to use the encoder/decoder
instead of the datafile format (why?) - the source stream can't have
it if you do not store it - it makes no sense for the encoder to store
schema for every given record, into a stream.

On Mon, Dec 5, 2011 at 11:18 PM, Gaurav Nanda <[EMAIL PROTECTED]> wrote:
> I guess I did not put it right way.
>
> See this sample code:
> ---------------------------------------
> public static void testRead (File file) throws IOException {
>    GenericDatumReader<GenericData.Record> datum = new
> GenericDatumReader<GenericData.Record>();
>    DataFileReader<GenericData.Record> reader = new
> DataFileReader<GenericData.Record>(file, datum);
>
>    GenericData.Record record = new GenericData.Record(reader.getSchema());
>    while (reader.hasNext()) {
>      reader.next(record);
>      System.out.println("Name " + record.get("name") + " Age " +
> record.get("age"));
>    }
>
>    reader.close();
>  }
> -------------------------------
> This takes file as an input, which contains both schema and actual data.
> In my case, Instead of having a file, I have some other stream of
> schema & data which I am passing to DecodeData() function.
>
> So, the question now is, how do I extract schema from there?
>
> Thanks,
> Gaurav Nanda
>
> On Mon, Dec 5, 2011 at 9:40 PM, Matt Stevenson
> <[EMAIL PROTECTED]> wrote:
>> No, the schema needs to be present in some form to tell the reader how to
>> decode the data.
>> You can generate classes from the schema and pass in the class, but that is
>> just a different way of passing in the schema.
>>
>>
>> On Mon, Dec 5, 2011 at 9:33 AM, Gaurav <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to read byte stream of encoded data, which is coming from some
>>> source but File. So I should not use DataFileReader.
>>>
>>> I wrote following code to do that, but here I have to specify schema on my
>>> own, which ideally should come from data itself. Is there any other way to
>>> get decode data with explicitly specifying schema and without using
>>> DataFileReader?
>>> ----------------------
>>>        private static void DecodeData(byte[] buf) throws IOException {
>>>                // TODO Auto-generated method stub
>>>                Schema schema = createSchema();
>>>                GenericDatumReader<GenericData.Record> datum = new
>>> GenericDatumReader<GenericData.Record>(schema);
>>>
>>>                ByteArrayInputStream in = new ByteArrayInputStream(buf);
>>>                BinaryDecoder decoder = DECODER_FACTORY.binaryDecoder(in,
>>> null);
>>>
>>>                GenericData.Record record = new
>>> GenericData.Record(datum.getSchema());
>>>                datum.read(record, decoder);
>>>
>>>                System.out.println(record.get("trade"));
>>>        }
>>> ---------------------
>>>
>>> Thanks,
>>> Gaurav Nanda
>>>
>>> --
>>> View this message in context:
>>> http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3561722.html
>>> Sent from the Avro - Users mailing list archive at Nabble.com.
>>
>>
>>
>>
>> --
>> Matt Stevenson.

--
Harsh J
+
Gaurav 2011-12-05, 18:13
+
Harsh J 2011-12-05, 18:44