Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


Copy link to this message
-
Pig Avrostorage Issue regarding Schema evaluation
Environment:

Pig version: 0.11
Hadoop 0.23.6.0.1301071353
Script:
REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar

employee= load '/user/immilind/AvroData' using
org.apache.pig.piggybank.storage.avro.AvroStorage( );
dump employee;
Schemas :

{
"type" : "record",
"name" : "employee",
"fields":[
    {"name" : "name", "type" : "string", "default" : "NU"},
    {"name" : "age", "type" : "int","default" : 0},
    {"name" : "dept", "type": "string","default" : "DU"},
    {"name" : "office", "type": "string","default" : "OU"},
    {"name" : "salary", "type": "float","default" : 0.0}
]
}

{
"type" : "record",
"name" : "employee",
"fields":[
    {"name" : "name", "type" : "string", "default" : "NU"},
    {"name" : "age", "type" : "int","default" : 0},
    {"name" : "dept", "type": "string","default" : "DU"},
    {"name" : "office", "type": "string","default" : "OU"},
    {"name" : "salary", "type": "int", "default" : 0}
]
}
Both the schemas differ only in one field. As per the schema evolution/
merging rules, I am expecting to see "int" fields loaded as "float". But
instead, the job fails due to field mismatch.

I am referring to :

Similar thread named "Working with changing schemas (avro) in Pig"
https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E

JIRA:
https://issues.apache.org/jira/browse/PIG-2579
How to use "multiple_schema' option with "AvroStorage"  as suggested by
this JIRA ?

Function mergeType indicating rules for primitive types
https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java

Can anybody suggest what is going wrong ?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB