Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


+
Milind Vaidya 2013-01-09, 14:49
+
Cheolsoo Park 2013-01-09, 18:26
+
Russell Jurney 2013-01-09, 19:25
+
Cheolsoo Park 2013-01-09, 22:15
Copy link to this message
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Np, I included it myself forever before someone pointed that out :)

Russell Jurney http://datasyndrome.com

On Jan 9, 2013, at 2:15 PM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Russell,
>
> You're absolute right. Jackson is not needed. Thanks for point that out!
>
> Cheolsoo
>
>
> On Wed, Jan 9, 2013 at 11:25 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> Jackson is no longer needed, right? Or is it coming back in 0.11?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Milind,
>>>
>>> Please try this:
>>>
>>> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
>>> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
>>> REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar
>>> REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar
>>> REGISTER contrib/piggybank/java/piggybank.jar
>>>
>>> employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING
>>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>>> DESCRIBE employee;
>>> DUMP employee;
>>>
>>> I have two Avro files in my input directory:
>>>
>>> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
>>> record_employee.avro
>>> {"name":"a","age":0,"dept":"b","office":"c","salary":0.0}
>>>
>>> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
>>> record_employee2.avro
>>> {"name":"a","age":0,"dept":"b","office":"c","salary":0}
>>>
>>> record_employee.avro contains a float, and record_employee2.avro contains
>>> an int.
>>>
>>> The output looks as follows:
>>>
>>> ...
>>> employee: {name: chararray,age: int,dept: chararray,office:
>>> chararray,salary: float}
>>> ...
>>> (a,0,b,c,0.0)
>>> (a,0,b,c,0)
>>>
>>> Thanks,
>>> Cheolsoo
>>>
>>>
>>>
>>>
>>> On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote:
>>>
>>>> Environment:
>>>>
>>>> Pig version: 0.11
>>>> Hadoop 0.23.6.0.1301071353
>>>>
>>>>
>>>> Script:
>>>>
>>>>
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
>>>> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar
>>>>
>>>> employee= load '/user/immilind/AvroData' using
>>>> org.apache.pig.piggybank.storage.avro.AvroStorage( );
>>>> dump employee;
>>>>
>>>>
>>>> Schemas :
>>>>
>>>> {
>>>> "type" : "record",
>>>> "name" : "employee",
>>>> "fields":[
>>>>   {"name" : "name", "type" : "string", "default" : "NU"},
>>>>   {"name" : "age", "type" : "int","default" : 0},
>>>>   {"name" : "dept", "type": "string","default" : "DU"},
>>>>   {"name" : "office", "type": "string","default" : "OU"},
>>>>   {"name" : "salary", "type": "float","default" : 0.0}
>>>> ]
>>>> }
>>>>
>>>> {
>>>> "type" : "record",
>>>> "name" : "employee",
>>>> "fields":[
>>>>   {"name" : "name", "type" : "string", "default" : "NU"},
>>>>   {"name" : "age", "type" : "int","default" : 0},
>>>>   {"name" : "dept", "type": "string","default" : "DU"},
>>>>   {"name" : "office", "type": "string","default" : "OU"},
>>>>   {"name" : "salary", "type": "int", "default" : 0}
>>>> ]
>>>> }
>>>>
>>>>
>>>> Both the schemas differ only in one field. As per the schema evolution/
>>>> merging rules, I am expecting to see "int" fields loaded as "float". But
>>>> instead, the job fails due to field mismatch.
>>>>
>>>> I am referring to :
>>>>
>>>> Similar thread named "Working with changing schemas (avro) in Pig"
>>>>
>>>>
>> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E
>>>>
>>>> JIRA:
>>>> https://issues.apache.org/jira/browse/PIG-2579
>>>> How to use "multiple_schema' option with "AvroStorage"  as suggested by