Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


Copy link to this message
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Hi Milind,

Please try this:

REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar
REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar
REGISTER contrib/piggybank/java/piggybank.jar

employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
DESCRIBE employee;
DUMP employee;

I have two Avro files in my input directory:

$java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
record_employee.avro
{"name":"a","age":0,"dept":"b","office":"c","salary":0.0}

$java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
record_employee2.avro
{"name":"a","age":0,"dept":"b","office":"c","salary":0}

record_employee.avro contains a float, and record_employee2.avro contains
an int.

The output looks as follows:

...
employee: {name: chararray,age: int,dept: chararray,office:
chararray,salary: float}
...
(a,0,b,c,0.0)
(a,0,b,c,0)

Thanks,
Cheolsoo
On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote:

> Environment:
>
> Pig version: 0.11
> Hadoop 0.23.6.0.1301071353
>
>
> Script:
>
>
> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar
>
> employee= load '/user/immilind/AvroData' using
> org.apache.pig.piggybank.storage.avro.AvroStorage( );
> dump employee;
>
>
> Schemas :
>
> {
> "type" : "record",
> "name" : "employee",
> "fields":[
>     {"name" : "name", "type" : "string", "default" : "NU"},
>     {"name" : "age", "type" : "int","default" : 0},
>     {"name" : "dept", "type": "string","default" : "DU"},
>     {"name" : "office", "type": "string","default" : "OU"},
>     {"name" : "salary", "type": "float","default" : 0.0}
> ]
> }
>
> {
> "type" : "record",
> "name" : "employee",
> "fields":[
>     {"name" : "name", "type" : "string", "default" : "NU"},
>     {"name" : "age", "type" : "int","default" : 0},
>     {"name" : "dept", "type": "string","default" : "DU"},
>     {"name" : "office", "type": "string","default" : "OU"},
>     {"name" : "salary", "type": "int", "default" : 0}
> ]
> }
>
>
> Both the schemas differ only in one field. As per the schema evolution/
> merging rules, I am expecting to see "int" fields loaded as "float". But
> instead, the job fails due to field mismatch.
>
> I am referring to :
>
> Similar thread named "Working with changing schemas (avro) in Pig"
>
> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E
>
> JIRA:
> https://issues.apache.org/jira/browse/PIG-2579
> How to use "multiple_schema' option with "AvroStorage"  as suggested by
> this JIRA ?
>
> Function mergeType indicating rules for primitive types
>
> https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>
>
>
> Can anybody suggest what is going wrong ?
>