Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


Copy link to this message
-
Pig Avrostorage Issue regarding Schema evaluation
Environment:

Pig version: 0.11
Hadoop 0.23.6.0.1301071353
Script:
REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar

employee= load '/user/immilind/AvroData' using
org.apache.pig.piggybank.storage.avro.AvroStorage( );
dump employee;
Schemas :

{
"type" : "record",
"name" : "employee",
"fields":[
    {"name" : "name", "type" : "string", "default" : "NU"},
    {"name" : "age", "type" : "int","default" : 0},
    {"name" : "dept", "type": "string","default" : "DU"},
    {"name" : "office", "type": "string","default" : "OU"},
    {"name" : "salary", "type": "float","default" : 0.0}
]
}

{
"type" : "record",
"name" : "employee",
"fields":[
    {"name" : "name", "type" : "string", "default" : "NU"},
    {"name" : "age", "type" : "int","default" : 0},
    {"name" : "dept", "type": "string","default" : "DU"},
    {"name" : "office", "type": "string","default" : "OU"},
    {"name" : "salary", "type": "int", "default" : 0}
]
}
Both the schemas differ only in one field. As per the schema evolution/
merging rules, I am expecting to see "int" fields loaded as "float". But
instead, the job fails due to field mismatch.

I am referring to :

Similar thread named "Working with changing schemas (avro) in Pig"
https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E

JIRA:
https://issues.apache.org/jira/browse/PIG-2579
How to use "multiple_schema' option with "AvroStorage"  as suggested by
this JIRA ?

Function mergeType indicating rules for primitive types
https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java

Can anybody suggest what is going wrong ?