Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> Deserialize the attributes data using another schema give me wrong results

Raihan Jamal 2013-09-26, 00:10
Eric Wasserman 2013-09-26, 00:30
Copy link to this message
Re: Deserialize the attributes data using another schema give me wrong results
Thanks Eric. Now I have couple of questions on this-

1) So that means we cannot deserialize any attributes data using any other
schema? We always need to pass the schema that we have used for writing
along with any other schema that I want to use for reading purpose? Is that
2) Is there any way, I can deserialize any attributes data using any other
schema without passing actual schema that we have to serialize?

In my example if you see, I am already storing schemaId in the avro schema
that will map to some actual schema name. So while serializing any
attributes data, we will also store the schemaId within that avro binary
encoded value, and that schemaId will represent this is the schema we have
used to serialize it. Now while deserializing that attributes, firstly we
will grab the schemaId (by deserializing it with another schema) and see
which schema we have used actually to serialize that attributes and then we
will deserialize that attributes again using the actual schema...
*Raihan Jamal*
On Wed, Sep 25, 2013 at 5:30 PM, Eric Wasserman <[EMAIL PROTECTED]>wrote:

>  Short answer. Use this constructor instead:
>  /** Construct given writer's and reader's schema. */
>   public GenericDatumReader(Schema writer, Schema reader) {
>  Longer answer:
>  You have to give the GenericDatumReader the EXACT schema that wrote the
> bytes that you are trying to parse ("writer's schema").
> You can *also* give it another schema you'd like to use ("reader's
> schema") that can be different.
>  Try changing this line of your code:
>  GenericDatumReader<GenericRecord> r1 = new
> GenericDatumReader<GenericRecord>(schema1);
>  To this:
>  GenericDatumReader<GenericRecord> r1 = new
> GenericDatumReader<GenericRecord>(schema2, schema1); // writer's schema is
> "schema2", reader's schema is "schema1"
>  ------------------------------
> *From:* Raihan Jamal <[EMAIL PROTECTED]>
> *Sent:* Wednesday, September 25, 2013 5:10 PM
> *Subject:* Deserialize the attributes data using another schema give me
> wrong results
>   I am trying to serialize one of our Attributes Daya using Apache Avro
> Schema. Here the attribute name is `e7` and the schema that I am using to
> serialize it is `schema2.avsc` which is below.
>      {
>      "namespace": "com.avro.test.AvroExperiment",
>      "type": "record",
>      "name": "DEMOGRAPHIC",
>      "doc": "DEMOGRAPHIC data",
>         "fields": [
>             {"name": "dob", "type": "string"},
>             {"name": "gndr",  "type": "string"},
>             {"name": "occupation", "type": "string"},
>     {"name": "mrtlStatus", "type": "string"},
>     {"name": "numChldrn", "type": "int"},
>     {"name": "estInc", "type": "string"},
>     {"name": "schemaId", "type": "int"},
>     {"name": "lmd", "type": "long"}
>         ]
>     }
>  Below is the code that I am using to serialize the attribute `e7` using
> above avro `schema2.avsc`. And I am able to serialize it properly and it
> works fine...
>  Schema schema = new
> Parser().parse((AvroExperiment.class.getResourceAsStream("/schema2.avsc")));
> GenericRecord record = new GenericData.Record(schema);
> record.put("dob", "161913600000");
> record.put("gndr", "f");
> record.put("occupation", "doctor");
> record.put("mrtlStatus", "single");
> record.put("numChldrn", 3);
> record.put("estInc", "50000");
> record.put("schemaId", 20001);
> record.put("lmd", 1379814280254L);
>  GenericDatumWriter<GenericRecord> writer = new
> GenericDatumWriter<GenericRecord>(schema);
> ByteArrayOutputStream os = new ByteArrayOutputStream();
>  Encoder e = EncoderFactory.get().binaryEncoder(os, null);
>  writer.write(record, e);
> e.flush();
> byte[] byteData = os.toByteArray();
> os.close();
>  Now, I tried deserializing the same `e7` attributes data using the same
> above avro schema definition `schema2.avsc` and it also works fine, and I
> am able to deserialize it properly.
>  GenericDatumReader<GenericRecord> r = new
Raihan Jamal 2013-09-26, 07:33