Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig Avrostorage Issue regarding Schema evaluation


Copy link to this message
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Russell Jurney 2013-01-09, 19:25
Jackson is no longer needed, right? Or is it coming back in 0.11?

Russell Jurney http://datasyndrome.com

On Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Milind,
>
> Please try this:
>
> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar
> REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar
> REGISTER contrib/piggybank/java/piggybank.jar
>
> employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DESCRIBE employee;
> DUMP employee;
>
> I have two Avro files in my input directory:
>
> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
> record_employee.avro
> {"name":"a","age":0,"dept":"b","office":"c","salary":0.0}
>
> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
> record_employee2.avro
> {"name":"a","age":0,"dept":"b","office":"c","salary":0}
>
> record_employee.avro contains a float, and record_employee2.avro contains
> an int.
>
> The output looks as follows:
>
> ...
> employee: {name: chararray,age: int,dept: chararray,office:
> chararray,salary: float}
> ...
> (a,0,b,c,0.0)
> (a,0,b,c,0)
>
> Thanks,
> Cheolsoo
>
>
>
>
> On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote:
>
>> Environment:
>>
>> Pig version: 0.11
>> Hadoop 0.23.6.0.1301071353
>>
>>
>> Script:
>>
>>
>> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
>> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar
>>
>> employee= load '/user/immilind/AvroData' using
>> org.apache.pig.piggybank.storage.avro.AvroStorage( );
>> dump employee;
>>
>>
>> Schemas :
>>
>> {
>> "type" : "record",
>> "name" : "employee",
>> "fields":[
>>    {"name" : "name", "type" : "string", "default" : "NU"},
>>    {"name" : "age", "type" : "int","default" : 0},
>>    {"name" : "dept", "type": "string","default" : "DU"},
>>    {"name" : "office", "type": "string","default" : "OU"},
>>    {"name" : "salary", "type": "float","default" : 0.0}
>> ]
>> }
>>
>> {
>> "type" : "record",
>> "name" : "employee",
>> "fields":[
>>    {"name" : "name", "type" : "string", "default" : "NU"},
>>    {"name" : "age", "type" : "int","default" : 0},
>>    {"name" : "dept", "type": "string","default" : "DU"},
>>    {"name" : "office", "type": "string","default" : "OU"},
>>    {"name" : "salary", "type": "int", "default" : 0}
>> ]
>> }
>>
>>
>> Both the schemas differ only in one field. As per the schema evolution/
>> merging rules, I am expecting to see "int" fields loaded as "float". But
>> instead, the job fails due to field mismatch.
>>
>> I am referring to :
>>
>> Similar thread named "Working with changing schemas (avro) in Pig"
>>
>> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E
>>
>> JIRA:
>> https://issues.apache.org/jira/browse/PIG-2579
>> How to use "multiple_schema' option with "AvroStorage"  as suggested by
>> this JIRA ?
>>
>> Function mergeType indicating rules for primitive types
>>
>> https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
>>
>>
>>
>> Can anybody suggest what is going wrong ?
>>