Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


+
Milind Vaidya 2013-01-09, 14:49
Copy link to this message
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Hi Milind,

Please try this:

REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar
REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar
REGISTER contrib/piggybank/java/piggybank.jar

employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
DESCRIBE employee;
DUMP employee;

I have two Avro files in my input directory:

$java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
record_employee.avro
{"name":"a","age":0,"dept":"b","office":"c","salary":0.0}

$java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
record_employee2.avro
{"name":"a","age":0,"dept":"b","office":"c","salary":0}

record_employee.avro contains a float, and record_employee2.avro contains
an int.

The output looks as follows:

...
employee: {name: chararray,age: int,dept: chararray,office:
chararray,salary: float}
...
(a,0,b,c,0.0)
(a,0,b,c,0)

Thanks,
Cheolsoo
On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote:

> Environment:
>
> Pig version: 0.11
> Hadoop 0.23.6.0.1301071353
>
>
> Script:
>
>
> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar
>
> employee= load '/user/immilind/AvroData' using
> org.apache.pig.piggybank.storage.avro.AvroStorage( );
> dump employee;
>
>
> Schemas :
>
> {
> "type" : "record",
> "name" : "employee",
> "fields":[
>     {"name" : "name", "type" : "string", "default" : "NU"},
>     {"name" : "age", "type" : "int","default" : 0},
>     {"name" : "dept", "type": "string","default" : "DU"},
>     {"name" : "office", "type": "string","default" : "OU"},
>     {"name" : "salary", "type": "float","default" : 0.0}
> ]
> }
>
> {
> "type" : "record",
> "name" : "employee",
> "fields":[
>     {"name" : "name", "type" : "string", "default" : "NU"},
>     {"name" : "age", "type" : "int","default" : 0},
>     {"name" : "dept", "type": "string","default" : "DU"},
>     {"name" : "office", "type": "string","default" : "OU"},
>     {"name" : "salary", "type": "int", "default" : 0}
> ]
> }
>
>
> Both the schemas differ only in one field. As per the schema evolution/
> merging rules, I am expecting to see "int" fields loaded as "float". But
> instead, the job fails due to field mismatch.
>
> I am referring to :
>
> Similar thread named "Working with changing schemas (avro) in Pig"
>
> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E
>
> JIRA:
> https://issues.apache.org/jira/browse/PIG-2579
> How to use "multiple_schema' option with "AvroStorage"  as suggested by
> this JIRA ?
>
> Function mergeType indicating rules for primitive types
>
> https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>
>
>
> Can anybody suggest what is going wrong ?
>
+
Russell Jurney 2013-01-09, 19:25
+
Cheolsoo Park 2013-01-09, 22:15
+
Russell Jurney 2013-01-09, 22:37
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB