Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Deserialize the attributes data using another schema give me wrong results


+
Raihan Jamal 2013-09-26, 00:10
+
Eric Wasserman 2013-09-26, 00:30
+
Raihan Jamal 2013-09-26, 00:42
Copy link to this message
-
Re: Deserialize the attributes data using another schema give me wrong results
@Erin/Doug/Mika... Any thoughts on my previous question?
Thanks for the help....
*Raihan Jamal*
On Wed, Sep 25, 2013 at 5:42 PM, Raihan Jamal <[EMAIL PROTECTED]> wrote:

> Thanks Eric. Now I have couple of questions on this-
>
> 1) So that means we cannot deserialize any attributes data using any other
> schema? We always need to pass the schema that we have used for writing
> along with any other schema that I want to use for reading purpose? Is that
> right?
> 2) Is there any way, I can deserialize any attributes data using any other
> schema without passing actual schema that we have to serialize?
>
> In my example if you see, I am already storing schemaId in the avro schema
> that will map to some actual schema name. So while serializing any
> attributes data, we will also store the schemaId within that avro binary
> encoded value, and that schemaId will represent this is the schema we have
> used to serialize it. Now while deserializing that attributes, firstly we
> will grab the schemaId (by deserializing it with another schema) and see
> which schema we have used actually to serialize that attributes and then we
> will deserialize that attributes again using the actual schema...
>
>
>
>
>
>
> *Raihan Jamal*
>
>
> On Wed, Sep 25, 2013 at 5:30 PM, Eric Wasserman <[EMAIL PROTECTED]>wrote:
>
>>  Short answer. Use this constructor instead:
>>
>>  /** Construct given writer's and reader's schema. */
>>
>>   public GenericDatumReader(Schema writer, Schema reader) {
>>
>>  Longer answer:
>>
>>  You have to give the GenericDatumReader the EXACT schema that wrote the
>> bytes that you are trying to parse ("writer's schema").
>> You can *also* give it another schema you'd like to use ("reader's
>> schema") that can be different.
>>
>>
>>  Try changing this line of your code:
>>
>>  GenericDatumReader<GenericRecord> r1 = new
>> GenericDatumReader<GenericRecord>(schema1);
>>
>>  To this:
>>
>>  GenericDatumReader<GenericRecord> r1 = new
>> GenericDatumReader<GenericRecord>(schema2, schema1); // writer's schema is
>> "schema2", reader's schema is "schema1"
>>
>>
>>  ------------------------------
>> *From:* Raihan Jamal <[EMAIL PROTECTED]>
>> *Sent:* Wednesday, September 25, 2013 5:10 PM
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Deserialize the attributes data using another schema give me
>> wrong results
>>
>>   I am trying to serialize one of our Attributes Daya using Apache Avro
>> Schema. Here the attribute name is `e7` and the schema that I am using to
>> serialize it is `schema2.avsc` which is below.
>>
>>      {
>>      "namespace": "com.avro.test.AvroExperiment",
>>      "type": "record",
>>      "name": "DEMOGRAPHIC",
>>      "doc": "DEMOGRAPHIC data",
>>         "fields": [
>>             {"name": "dob", "type": "string"},
>>             {"name": "gndr",  "type": "string"},
>>             {"name": "occupation", "type": "string"},
>>     {"name": "mrtlStatus", "type": "string"},
>>     {"name": "numChldrn", "type": "int"},
>>     {"name": "estInc", "type": "string"},
>>     {"name": "schemaId", "type": "int"},
>>     {"name": "lmd", "type": "long"}
>>         ]
>>     }
>>
>>  Below is the code that I am using to serialize the attribute `e7` using
>> above avro `schema2.avsc`. And I am able to serialize it properly and it
>> works fine...
>>  Schema schema = new
>> Parser().parse((AvroExperiment.class.getResourceAsStream("/schema2.avsc")));
>> GenericRecord record = new GenericData.Record(schema);
>> record.put("dob", "161913600000");
>> record.put("gndr", "f");
>> record.put("occupation", "doctor");
>> record.put("mrtlStatus", "single");
>> record.put("numChldrn", 3);
>> record.put("estInc", "50000");
>> record.put("schemaId", 20001);
>> record.put("lmd", 1379814280254L);
>>
>>  GenericDatumWriter<GenericRecord> writer = new
>> GenericDatumWriter<GenericRecord>(schema);
>> ByteArrayOutputStream os = new ByteArrayOutputStream();
>>
>>  Encoder e = EncoderFactory.get().binaryEncoder(os, null);
>>
>>  writer.write(record, e);