|
|
-
Pig Avrostorage Issue regarding Schema evaluation
Milind Vaidya 2013-01-09, 14:49
Environment: Pig version: 0.11 Hadoop 0.23.6.0.1301071353 Script: REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar employee= load '/user/immilind/AvroData' using org.apache.pig.piggybank.storage.avro.AvroStorage( ); dump employee; Schemas : { "type" : "record", "name" : "employee", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int","default" : 0}, {"name" : "dept", "type": "string","default" : "DU"}, {"name" : "office", "type": "string","default" : "OU"}, {"name" : "salary", "type": "float","default" : 0.0} ] } { "type" : "record", "name" : "employee", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int","default" : 0}, {"name" : "dept", "type": "string","default" : "DU"}, {"name" : "office", "type": "string","default" : "OU"}, {"name" : "salary", "type": "int", "default" : 0} ] } Both the schemas differ only in one field. As per the schema evolution/ merging rules, I am expecting to see "int" fields loaded as "float". But instead, the job fails due to field mismatch. I am referring to : Similar thread named "Working with changing schemas (avro) in Pig" https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E JIRA: https://issues.apache.org/jira/browse/PIG-2579How to use "multiple_schema' option with "AvroStorage" as suggested by this JIRA ? Function mergeType indicating rules for primitive types https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.javaCan anybody suggest what is going wrong ?
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Cheolsoo Park 2013-01-09, 18:26
Hi Milind, Please try this: REGISTER build/ivy/lib/Pig/avro-1.7.1.jar REGISTER build/ivy/lib/Pig/json-simple-1.1.jar REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar REGISTER contrib/piggybank/java/piggybank.jar employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); DESCRIBE employee; DUMP employee; I have two Avro files in my input directory: $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson record_employee.avro {"name":"a","age":0,"dept":"b","office":"c","salary":0.0} $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson record_employee2.avro {"name":"a","age":0,"dept":"b","office":"c","salary":0} record_employee.avro contains a float, and record_employee2.avro contains an int. The output looks as follows: ... employee: {name: chararray,age: int,dept: chararray,office: chararray,salary: float} ... (a,0,b,c,0.0) (a,0,b,c,0) Thanks, Cheolsoo On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote: > Environment: > > Pig version: 0.11 > Hadoop 0.23.6.0.1301071353 > > > Script: > > > REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar > REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar > REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar > REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar > REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar > REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar > REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar > > employee= load '/user/immilind/AvroData' using > org.apache.pig.piggybank.storage.avro.AvroStorage( ); > dump employee; > > > Schemas : > > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"}, > {"name" : "salary", "type": "float","default" : 0.0} > ] > } > > { > "type" : "record", > "name" : "employee", > "fields":[ > {"name" : "name", "type" : "string", "default" : "NU"}, > {"name" : "age", "type" : "int","default" : 0}, > {"name" : "dept", "type": "string","default" : "DU"}, > {"name" : "office", "type": "string","default" : "OU"}, > {"name" : "salary", "type": "int", "default" : 0} > ] > } > > > Both the schemas differ only in one field. As per the schema evolution/ > merging rules, I am expecting to see "int" fields loaded as "float". But > instead, the job fails due to field mismatch. > > I am referring to : > > Similar thread named "Working with changing schemas (avro) in Pig" > > https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E > > JIRA: > https://issues.apache.org/jira/browse/PIG-2579> How to use "multiple_schema' option with "AvroStorage" as suggested by > this JIRA ? > > Function mergeType indicating rules for primitive types > > https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java> > > > Can anybody suggest what is going wrong ? >
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Russell Jurney 2013-01-09, 19:25
Jackson is no longer needed, right? Or is it coming back in 0.11? Russell Jurney http://datasyndrome.comOn Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote: > Hi Milind, > > Please try this: > > REGISTER build/ivy/lib/Pig/avro-1.7.1.jar > REGISTER build/ivy/lib/Pig/json-simple-1.1.jar > REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar > REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar > REGISTER contrib/piggybank/java/piggybank.jar > > employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > DESCRIBE employee; > DUMP employee; > > I have two Avro files in my input directory: > > $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson > record_employee.avro > {"name":"a","age":0,"dept":"b","office":"c","salary":0.0} > > $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson > record_employee2.avro > {"name":"a","age":0,"dept":"b","office":"c","salary":0} > > record_employee.avro contains a float, and record_employee2.avro contains > an int. > > The output looks as follows: > > ... > employee: {name: chararray,age: int,dept: chararray,office: > chararray,salary: float} > ... > (a,0,b,c,0.0) > (a,0,b,c,0) > > Thanks, > Cheolsoo > > > > > On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote: > >> Environment: >> >> Pig version: 0.11 >> Hadoop 0.23.6.0.1301071353 >> >> >> Script: >> >> >> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar >> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar >> >> employee= load '/user/immilind/AvroData' using >> org.apache.pig.piggybank.storage.avro.AvroStorage( ); >> dump employee; >> >> >> Schemas : >> >> { >> "type" : "record", >> "name" : "employee", >> "fields":[ >> {"name" : "name", "type" : "string", "default" : "NU"}, >> {"name" : "age", "type" : "int","default" : 0}, >> {"name" : "dept", "type": "string","default" : "DU"}, >> {"name" : "office", "type": "string","default" : "OU"}, >> {"name" : "salary", "type": "float","default" : 0.0} >> ] >> } >> >> { >> "type" : "record", >> "name" : "employee", >> "fields":[ >> {"name" : "name", "type" : "string", "default" : "NU"}, >> {"name" : "age", "type" : "int","default" : 0}, >> {"name" : "dept", "type": "string","default" : "DU"}, >> {"name" : "office", "type": "string","default" : "OU"}, >> {"name" : "salary", "type": "int", "default" : 0} >> ] >> } >> >> >> Both the schemas differ only in one field. As per the schema evolution/ >> merging rules, I am expecting to see "int" fields loaded as "float". But >> instead, the job fails due to field mismatch. >> >> I am referring to : >> >> Similar thread named "Working with changing schemas (avro) in Pig" >> >> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E >> >> JIRA: >> https://issues.apache.org/jira/browse/PIG-2579>> How to use "multiple_schema' option with "AvroStorage" as suggested by >> this JIRA ? >> >> Function mergeType indicating rules for primitive types >> >> https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java>> >> >> >> Can anybody suggest what is going wrong ? >>
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Cheolsoo Park 2013-01-09, 22:15
Hi Russell, You're absolute right. Jackson is not needed. Thanks for point that out! Cheolsoo On Wed, Jan 9, 2013 at 11:25 AM, Russell Jurney <[EMAIL PROTECTED]>wrote: > Jackson is no longer needed, right? Or is it coming back in 0.11? > > Russell Jurney http://datasyndrome.com> > On Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote: > > > Hi Milind, > > > > Please try this: > > > > REGISTER build/ivy/lib/Pig/avro-1.7.1.jar > > REGISTER build/ivy/lib/Pig/json-simple-1.1.jar > > REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar > > REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar > > REGISTER contrib/piggybank/java/piggybank.jar > > > > employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING > > org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); > > DESCRIBE employee; > > DUMP employee; > > > > I have two Avro files in my input directory: > > > > $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson > > record_employee.avro > > {"name":"a","age":0,"dept":"b","office":"c","salary":0.0} > > > > $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson > > record_employee2.avro > > {"name":"a","age":0,"dept":"b","office":"c","salary":0} > > > > record_employee.avro contains a float, and record_employee2.avro contains > > an int. > > > > The output looks as follows: > > > > ... > > employee: {name: chararray,age: int,dept: chararray,office: > > chararray,salary: float} > > ... > > (a,0,b,c,0.0) > > (a,0,b,c,0) > > > > Thanks, > > Cheolsoo > > > > > > > > > > On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote: > > > >> Environment: > >> > >> Pig version: 0.11 > >> Hadoop 0.23.6.0.1301071353 > >> > >> > >> Script: > >> > >> > >> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar > >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar > >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar > >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar > >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar > >> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar > >> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar > >> > >> employee= load '/user/immilind/AvroData' using > >> org.apache.pig.piggybank.storage.avro.AvroStorage( ); > >> dump employee; > >> > >> > >> Schemas : > >> > >> { > >> "type" : "record", > >> "name" : "employee", > >> "fields":[ > >> {"name" : "name", "type" : "string", "default" : "NU"}, > >> {"name" : "age", "type" : "int","default" : 0}, > >> {"name" : "dept", "type": "string","default" : "DU"}, > >> {"name" : "office", "type": "string","default" : "OU"}, > >> {"name" : "salary", "type": "float","default" : 0.0} > >> ] > >> } > >> > >> { > >> "type" : "record", > >> "name" : "employee", > >> "fields":[ > >> {"name" : "name", "type" : "string", "default" : "NU"}, > >> {"name" : "age", "type" : "int","default" : 0}, > >> {"name" : "dept", "type": "string","default" : "DU"}, > >> {"name" : "office", "type": "string","default" : "OU"}, > >> {"name" : "salary", "type": "int", "default" : 0} > >> ] > >> } > >> > >> > >> Both the schemas differ only in one field. As per the schema evolution/ > >> merging rules, I am expecting to see "int" fields loaded as "float". But > >> instead, the job fails due to field mismatch. > >> > >> I am referring to : > >> > >> Similar thread named "Working with changing schemas (avro) in Pig" > >> > >> > https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E > >> > >> JIRA: > >> https://issues.apache.org/jira/browse/PIG-2579> >> How to use "multiple_schema' option with "AvroStorage" as suggested by > >> this JIRA ? > >> > >> Function mergeType indicating rules for primitive types > >> > >> > https://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java
-
Re: Pig Avrostorage Issue regarding Schema evaluation
Russell Jurney 2013-01-09, 22:37
Np, I included it myself forever before someone pointed that out :) Russell Jurney http://datasyndrome.comOn Jan 9, 2013, at 2:15 PM, Cheolsoo Park <[EMAIL PROTECTED]> wrote: > Hi Russell, > > You're absolute right. Jackson is not needed. Thanks for point that out! > > Cheolsoo > > > On Wed, Jan 9, 2013 at 11:25 AM, Russell Jurney <[EMAIL PROTECTED]>wrote: > >> Jackson is no longer needed, right? Or is it coming back in 0.11? >> >> Russell Jurney http://datasyndrome.com>> >> On Jan 9, 2013, at 10:26 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote: >> >>> Hi Milind, >>> >>> Please try this: >>> >>> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar >>> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar >>> REGISTER build/ivy/lib/Pig/jackson-mapper-asl-1.8.8.jar >>> REGISTER build/ivy/lib/Pig/jackson-core-asl-1.8.8.jar >>> REGISTER contrib/piggybank/java/piggybank.jar >>> >>> employee = LOAD '/home/cheolsoo/workspace/avro/emplyees' USING >>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas'); >>> DESCRIBE employee; >>> DUMP employee; >>> >>> I have two Avro files in my input directory: >>> >>> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson >>> record_employee.avro >>> {"name":"a","age":0,"dept":"b","office":"c","salary":0.0} >>> >>> $java -jar /home/cheolsoo/workspace/avro/avro-tools-1.7.1.jar tojson >>> record_employee2.avro >>> {"name":"a","age":0,"dept":"b","office":"c","salary":0} >>> >>> record_employee.avro contains a float, and record_employee2.avro contains >>> an int. >>> >>> The output looks as follows: >>> >>> ... >>> employee: {name: chararray,age: int,dept: chararray,office: >>> chararray,salary: float} >>> ... >>> (a,0,b,c,0.0) >>> (a,0,b,c,0) >>> >>> Thanks, >>> Cheolsoo >>> >>> >>> >>> >>> On Wed, Jan 9, 2013 at 6:49 AM, Milind Vaidya <[EMAIL PROTECTED]> wrote: >>> >>>> Environment: >>>> >>>> Pig version: 0.11 >>>> Hadoop 0.23.6.0.1301071353 >>>> >>>> >>>> Script: >>>> >>>> >>>> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar >>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar >>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-core-asl-1.8.10.jar >>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-jaxrs-1.8.10.jar >>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-mapper-asl-1.8.10.jar >>>> REGISTER /homes/immilind/HadoopLocal/Jars/jackson-xc-1.8.10.jar >>>> REGISTER /home/gs/pig/current/lib-hadoop23/piggybank.jar >>>> >>>> employee= load '/user/immilind/AvroData' using >>>> org.apache.pig.piggybank.storage.avro.AvroStorage( ); >>>> dump employee; >>>> >>>> >>>> Schemas : >>>> >>>> { >>>> "type" : "record", >>>> "name" : "employee", >>>> "fields":[ >>>> {"name" : "name", "type" : "string", "default" : "NU"}, >>>> {"name" : "age", "type" : "int","default" : 0}, >>>> {"name" : "dept", "type": "string","default" : "DU"}, >>>> {"name" : "office", "type": "string","default" : "OU"}, >>>> {"name" : "salary", "type": "float","default" : 0.0} >>>> ] >>>> } >>>> >>>> { >>>> "type" : "record", >>>> "name" : "employee", >>>> "fields":[ >>>> {"name" : "name", "type" : "string", "default" : "NU"}, >>>> {"name" : "age", "type" : "int","default" : 0}, >>>> {"name" : "dept", "type": "string","default" : "DU"}, >>>> {"name" : "office", "type": "string","default" : "OU"}, >>>> {"name" : "salary", "type": "int", "default" : 0} >>>> ] >>>> } >>>> >>>> >>>> Both the schemas differ only in one field. As per the schema evolution/ >>>> merging rules, I am expecting to see "int" fields loaded as "float". But >>>> instead, the job fails due to field mismatch. >>>> >>>> I am referring to : >>>> >>>> Similar thread named "Working with changing schemas (avro) in Pig" >>>> >>>> >> https://mail-archives.apache.org/mod_mbox/pig-user/201204.mbox/%[EMAIL PROTECTED]%3E >>>> >>>> JIRA: >>>> https://issues.apache.org/jira/browse/PIG-2579>>>> How to use "multiple_schema' option with "AvroStorage" as suggested by
|
|