Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Avrostorage Issue regarding Schema evaluation


Copy link to this message
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Np, I included it myself forever before someone pointed that out :)

Russell Jurney http://datasyndrome.com

On Jan 9, 2013, at 2:15 PM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Russell,
>
> You're absolute right. Jackson is not needed. Thanks for point that out!
>
> Cheolsoo
>
>
> On Wed, Jan 9, 2013 at 11:25 AM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> Jackson is no longer needed, right? Or is it coming back in 0.11?
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Milind,
>>>
>>> Please try this:
>>>
>>> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
>>> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
>>> REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar
>>> REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar
>>> REGISTER contrib/piggybank/java/piggybank.jar
>>>
>>> employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING
>>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>>> DESCRIBE employee;
>>> DUMP employee;
>>>
>>> I have two Avro files in my input directory:
>>>
>>> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
>>> record_employee.avro
>>> {"name":"a","age":0,"dept":"b","office":"c","salary":0.0}
>>>
>>> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson
>>> record_employee2.avro
>>> {"name":"a","age":0,"dept":"b","office":"c","salary":0}
>>>
>>> record_employee.avro contains a float, and record_employee2.avro contains
>>> an int.
>>>
>>> The output looks as follows:
>>>
>>> ...
>>> employee: {name: chararray,age: int,dept: chararray,office:
>>> chararray,salary: float}
>>> ...
>>> (a,0,b,c,0.0)
>>> (a,0,b,c,0)
>>>
>>> Thanks,
>>> Cheolsoo
>>>
>>>
>>>
>>>
>>> On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote:
>>>
>>>> Environment:
>>>>
>>>> Pig version: 0.11
>>>> Hadoop 0.23.6.0.1301071353
>>>>
>>>>
>>>> Script:
>>>>
>>>>
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar
>>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar
>>>> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar
>>>>
>>>> employee= load '/user/immilind/AvroData' using
>>>> org.apache.pig.piggybank.storage.avro.AvroStorage( );
>>>> dump employee;
>>>>
>>>>
>>>> Schemas :
>>>>
>>>> {
>>>> "type" : "record",
>>>> "name" : "employee",
>>>> "fields":[
>>>>   {"name" : "name", "type" : "string", "default" : "NU"},
>>>>   {"name" : "age", "type" : "int","default" : 0},
>>>>   {"name" : "dept", "type": "string","default" : "DU"},
>>>>   {"name" : "office", "type": "string","default" : "OU"},
>>>>   {"name" : "salary", "type": "float","default" : 0.0}
>>>> ]
>>>> }
>>>>
>>>> {
>>>> "type" : "record",
>>>> "name" : "employee",
>>>> "fields":[
>>>>   {"name" : "name", "type" : "string", "default" : "NU"},
>>>>   {"name" : "age", "type" : "int","default" : 0},
>>>>   {"name" : "dept", "type": "string","default" : "DU"},
>>>>   {"name" : "office", "type": "string","default" : "OU"},
>>>>   {"name" : "salary", "type": "int", "default" : 0}
>>>> ]
>>>> }
>>>>
>>>>
>>>> Both the schemas differ only in one field. As per the schema evolution/
>>>> merging rules, I am expecting to see "int" fields loaded as "float". But
>>>> instead, the job fails due to field mismatch.
>>>>
>>>> I am referring to :
>>>>
>>>> Similar thread named "Working with changing schemas (avro) in Pig"
>>>>
>>>>
>> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E
>>>>
>>>> JIRA:
>>>> https://issues.apache.org/jira/browse/PIG-2579
>>>> How to use "multiple_schema' option with "AvroStorage"  as suggested by
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB