|
Andrew Kenworthy
2012-01-09, 09:15
Stan Rosenberg
2012-01-09, 16:30
Andrew Kenworthy
2012-01-10, 13:03
Stan Rosenberg
2012-01-10, 16:36
Andrew Kenworthy
2012-01-11, 09:47
Scott Carey
2012-01-16, 21:16
Russell Jurney
2012-01-09, 20:47
Stan Rosenberg
2012-01-09, 20:52
Russell Jurney
2012-01-09, 21:21
Bill Graham
2012-01-10, 00:42
|
-
Simple AvroStorage LOAD and STORE with Avro 1.6.0Andrew Kenworthy 2012-01-09, 09:15
Hallo,
When I run a simple pig script to LOAD and STORE avro data, I get:- java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord Script: REGISTER /tmp/avro-1.6.0.jar; --REGISTER /tmp/avro-1.5.4.jar --REGISTER /tmp/avro-1.4.1.jar; REGISTER /tmp/piggybank-0.9.1.jar; REGISTER /tmp/json-simple-1.1.jar; REGISTER /tmp/jackson-core-asl-1.8.4.jar; REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); dataSubset = FOREACH avroData GENERATE myField1, myField2; describe dataSubset; ----------------------------------------------- -- shows: -- dataSubset : {myField1: int,myField2: int} ----------------------------------------------- STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0. I see there's been a related issue fixed here: https://issues.apache.org/jira/browse/PIG-2202 https://issues.apache.org/jira/browse/PIG-2195 Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie? Many thanks, Andrew +
Andrew Kenworthy 2012-01-09, 09:15
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Stan Rosenberg 2012-01-09, 16:30
Andrew,
The source of the problem may be AvroStorage in piggybank. Could you please include the entire stack trace? stan On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <[EMAIL PROTECTED]> wrote: > Hallo, > > When I run a simple pig script to LOAD and STORE avro data, I get:- > > java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord > > > Script: > > REGISTER /tmp/avro-1.6.0.jar; > --REGISTER /tmp/avro-1.5.4.jar > --REGISTER /tmp/avro-1.4.1.jar; > > REGISTER /tmp/piggybank-0.9.1.jar; > REGISTER /tmp/json-simple-1.1.jar; > REGISTER /tmp/jackson-core-asl-1.8.4.jar; > REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; > > avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); > > dataSubset = FOREACH avroData GENERATE myField1, myField2; > describe dataSubset; > ----------------------------------------------- > -- shows: > -- dataSubset : {myField1: int,myField2: int} > ----------------------------------------------- > STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); > > If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0. > > I see there's been a related issue fixed here: > > https://issues.apache.org/jira/browse/PIG-2202 > https://issues.apache.org/jira/browse/PIG-2195 > > Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie? > > Many thanks, > > Andrew +
Stan Rosenberg 2012-01-09, 16:30
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Andrew Kenworthy 2012-01-10, 13:03
Hi Stan,
here's the full stacktrace: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord at org.apache.avro.generic.GenericData.getField(GenericData.java:525) at org.apache.avro.generic.GenericData.getField(GenericData.java:540) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255) ... 18 more Andrew >________________________________ > From: Stan Rosenberg <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED]; Andrew Kenworthy <[EMAIL PROTECTED]> >Sent: Monday, January 9, 2012 5:30 PM >Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 > >Andrew, > >The source of the problem may be AvroStorage in piggybank. Could you >please include the entire stack trace? > >stan > >On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <[EMAIL PROTECTED]> wrote: >> Hallo, >> >> When I run a simple pig script to LOAD and STORE avro data, I get:- >> >> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord >> >> >> Script: >> >> REGISTER /tmp/avro-1.6.0.jar; >> --REGISTER /tmp/avro-1.5.4.jar >> --REGISTER /tmp/avro-1.4.1.jar; >> >> REGISTER /tmp/piggybank-0.9.1.jar; >> REGISTER /tmp/json-simple-1.1.jar; >> REGISTER /tmp/jackson-core-asl-1.8.4.jar; >> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; >> >> avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); >> >> dataSubset = FOREACH avroData GENERATE myField1, myField2; >> describe dataSubset; >> ----------------------------------------------- >> -- shows: >> -- dataSubset : {myField1: int,myField2: int} +
Andrew Kenworthy 2012-01-10, 13:03
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Stan Rosenberg 2012-01-10, 16:36
Andrew,
Something looks odd in this stack trace: Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord > at org.apache.avro.generic.GenericData.getField(GenericData.java:525) > at org.apache.avro.generic.GenericData.getField(GenericData.java:540) > at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) > at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) > at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order to extract values from a tuple. Thus, I would expect that the third method invocation be PigAvroDatumWriter.writeRecord. Perhaps, someone else has more insight as to why it's not getting invoked. In the meantime, please confirm that both PigAvroDatumWriter and GenericDatumWriter are loaded from the right jar files. (You can do this by temporarily changing the pig script to invoke JVM with 'java -verbose' and 'grep' the output for these classes.) Best, stan On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy <[EMAIL PROTECTED]> wrote: > Hi Stan, > > here's the full stacktrace: > > org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) > at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) > at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) > at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530) > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord > at org.apache.avro.generic.GenericData.getField(GenericData.java:525) > at org.apache.avro.generic.GenericData.getField(GenericData.java:540) > at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) > at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) > at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) > at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255) +
Stan Rosenberg 2012-01-10, 16:36
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Andrew Kenworthy 2012-01-11, 09:47
Hi Stan,
Thank you for your feedback. I've run the script passing "-D mapred.child.java.opts=-verbose:class" and have the following in my logs: [Loaded org.apache.avro.generic.GenericDatumWriter from file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar] [Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar] I assume the .../job_201111230039_0146/jars/job.jar is the one prepared by pig using the jars I have REGISTER-ed, in which case the classes are the ones I expect, or have I misread that? Regards, Andrew >________________________________ > From: Stan Rosenberg <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED]; Andrew Kenworthy <[EMAIL PROTECTED]> >Sent: Tuesday, January 10, 2012 5:36 PM >Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 > >Andrew, > >Something looks odd in this stack trace: > >Caused by: java.lang.ClassCastException: >org.apache.pig.data.BinSedesTuple cannot be cast to >org.apache.avro.generic.IndexedRecord >> at org.apache.avro.generic.GenericData.getField(GenericData.java:525) >> at org.apache.avro.generic.GenericData.getField(GenericData.java:540) >> at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) >> at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) >> at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) > >PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order >to extract values from a tuple. Thus, I would expect that the third >method invocation be PigAvroDatumWriter.writeRecord. Perhaps, someone >else has more insight as to why it's not getting invoked. In the >meantime, please confirm that both PigAvroDatumWriter and >GenericDatumWriter are loaded from the right jar files. (You can do >this by temporarily changing the pig script to invoke JVM with 'java >-verbose' and 'grep' the output for these classes.) > >Best, > >stan > >On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy ><[EMAIL PROTECTED]> wrote: >> Hi Stan, >> >> here's the full stacktrace: >> >> org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord >> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) >> at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) >> at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580) >> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) >> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) >> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530) >> at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) >> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) >> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) >> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) >> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) +
Andrew Kenworthy 2012-01-11, 09:47
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Scott Carey 2012-01-16, 21:16
FYI:
https://issues.apache.org/jira/browse/AVRO-993 I expect that Avro 1.6.2 will add these methods back in. On 1/11/12 1:47 AM, "Andrew Kenworthy" <[EMAIL PROTECTED]> wrote: >Hi Stan, > >Thank you for your feedback. I've run the script passing "-D >mapred.child.java.opts=-verbose:class" and have the following in my logs: > >[Loaded org.apache.avro.generic.GenericDatumWriter from >file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworth >y/jobcache/job_201111230039_0146/jars/job.jar] >[Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from >file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworth >y/jobcache/job_201111230039_0146/jars/job.jar] > >I assume the .../job_201111230039_0146/jars/job.jar is the one prepared >by pig using the jars I have REGISTER-ed, in which case the classes are >the ones I expect, or have I misread that? > >Regards, > >Andrew > > > >>________________________________ >> From: Stan Rosenberg <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED]; Andrew Kenworthy <[EMAIL PROTECTED]> >>Sent: Tuesday, January 10, 2012 5:36 PM >>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 >> >>Andrew, >> >>Something looks odd in this stack trace: >> >>Caused by: java.lang.ClassCastException: >>org.apache.pig.data.BinSedesTuple cannot be cast to >>org.apache.avro.generic.IndexedRecord >>> at >>>org.apache.avro.generic.GenericData.getField(GenericData.java:525) >>> at >>>org.apache.avro.generic.GenericData.getField(GenericData.java:540) >>> at >>>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWrite >>>r.java:103) >>> at >>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java >>>:65) >>> at >>>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDa >>>tumWriter.java:99) >> >>PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order >>to extract values from a tuple. Thus, I would expect that the third >>method invocation be PigAvroDatumWriter.writeRecord. Perhaps, someone >>else has more insight as to why it's not getting invoked. In the >>meantime, please confirm that both PigAvroDatumWriter and >>GenericDatumWriter are loaded from the right jar files. (You can do >>this by temporarily changing the pig script to invoke JVM with 'java >>-verbose' and 'grep' the output for these classes.) >> >>Best, >> >>stan >> >>On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy >><[EMAIL PROTECTED]> wrote: >>> Hi Stan, >>> >>> here's the full stacktrace: >>> >>> org.apache.avro.file.DataFileWriter$AppendWriteException: >>>java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot >>>be cast to org.apache.avro.generic.IndexedRecord >>> at >>>org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) >>> at >>>org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroR >>>ecordWriter.java:49) >>> at >>>org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.ja >>>va:580) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFo >>>rmat$PigRecordWriter.write(PigOutputFormat.java:138) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFo >>>rmat$PigRecordWriter.write(PigOutputFormat.java:97) >>> at >>>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask. >>>java:530) >>> at >>>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutput >>>Context.java:80) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$ >>>Map.collect(PigMapOnly.java:48) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase. >>>runPipeline(PigMapBase.java:238) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase. >>>map(PigMapBase.java:231) >>> at >>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase. +
Scott Carey 2012-01-16, 21:16
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Russell Jurney 2012-01-09, 20:47
I could only make AvroStorage work with Avro 1.4.1.
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <[EMAIL PROTECTED]> wrote: Hallo, When I run a simple pig script to LOAD and STORE avro data, I get:- java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord Script: REGISTER /tmp/avro-1.6.0.jar; --REGISTER /tmp/avro-1.5.4.jar --REGISTER /tmp/avro-1.4.1.jar; REGISTER /tmp/piggybank-0.9.1.jar; REGISTER /tmp/json-simple-1.1.jar; REGISTER /tmp/jackson-core-asl-1.8.4.jar; REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); dataSubset = FOREACH avroData GENERATE myField1, myField2; describe dataSubset; ----------------------------------------------- -- shows: -- dataSubset : { myField1: int, myField2: int} ----------------------------------------------- STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0. I see there's been a related issue fixed here: https://issues.apache.org/jira/browse/PIG-2202 https://issues.apache.org/jira/browse/PIG-2195 Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie? Many thanks, Andrew +
Russell Jurney 2012-01-09, 20:47
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Stan Rosenberg 2012-01-09, 20:52
Generally, AvroStorage works fine for us with Avro 1.6. However, we
also patched AvroStorage on a couple of occasions, e.g., see PIG-2330. stan On Mon, Jan 9, 2012 at 3:47 PM, Russell Jurney <[EMAIL PROTECTED]> wrote: > I could only make AvroStorage work with Avro 1.4.1. > > Russell Jurney > twitter.com/rjurney > [EMAIL PROTECTED] > datasyndrome.com > > On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <[EMAIL PROTECTED]> wrote: > > Hallo, > > When I run a simple pig script to LOAD and STORE avro data, I get:- > > java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be > cast to org.apache.avro.generic.IndexedRecord > > Script: > > REGISTER /tmp/avro-1.6.0.jar; > --REGISTER /tmp/avro-1.5.4.jar > --REGISTER /tmp/avro-1.4.1.jar; > > REGISTER /tmp/piggybank-0.9.1.jar; > REGISTER /tmp/json-simple-1.1.jar; > REGISTER /tmp/jackson-core-asl-1.8.4.jar; > REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; > > avroData=LOAD '$DATA_INPUTDIR' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > > dataSubset = FOREACH avroData GENERATE myField1, myField2; > describe dataSubset; > ----------------------------------------------- > -- shows: > -- dataSubset : { myField1: int, myField2: int} > ----------------------------------------------- > STORE dataSubset INTO '$OUTPUTDIR' USING > org.apache.pig.piggybank.storage.avro.AvroStorage(); > > If I use the 1.5.4 jar I get the same error, but the script works with the > 1.4.1 version. If I just write one field, then it works with 1.6.0. > > I see there's been a related issue fixed here: > > https://issues.apache.org/jira/browse/PIG-2202 > https://issues.apache.org/jira/browse/PIG-2195 > > Can anyone confirm that this or similar works with avro 1.6.0, and/or point > me in the right direction concering where the problem may lie? > > Many thanks, > > Andrew +
Stan Rosenberg 2012-01-09, 20:52
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Russell Jurney 2012-01-09, 21:21
Avro 1.4.1 only works for me with PIG-2411 applied
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com On Jan 9, 2012, at 12:52 PM, Stan Rosenberg <[EMAIL PROTECTED]> wrote: > Generally, AvroStorage works fine for us with Avro 1.6. However, we > also patched AvroStorage on a couple of occasions, e.g., see PIG-2330. > > stan > > On Mon, Jan 9, 2012 at 3:47 PM, Russell Jurney <[EMAIL PROTECTED]> wrote: >> I could only make AvroStorage work with Avro 1.4.1. >> >> Russell Jurney >> twitter.com/rjurney >> [EMAIL PROTECTED] >> datasyndrome.com >> >> On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <[EMAIL PROTECTED]> wrote: >> >> Hallo, >> >> When I run a simple pig script to LOAD and STORE avro data, I get:- >> >> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be >> cast to org.apache.avro.generic.IndexedRecord >> >> Script: >> >> REGISTER /tmp/avro-1.6.0.jar; >> --REGISTER /tmp/avro-1.5.4.jar >> --REGISTER /tmp/avro-1.4.1.jar; >> >> REGISTER /tmp/piggybank-0.9.1.jar; >> REGISTER /tmp/json-simple-1.1.jar; >> REGISTER /tmp/jackson-core-asl-1.8.4.jar; >> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; >> >> avroData=LOAD '$DATA_INPUTDIR' USING >> org.apache.pig.piggybank.storage.avro.AvroStorage(); >> >> dataSubset = FOREACH avroData GENERATE myField1, myField2; >> describe dataSubset; >> ----------------------------------------------- >> -- shows: >> -- dataSubset : { myField1: int, myField2: int} >> ----------------------------------------------- >> STORE dataSubset INTO '$OUTPUTDIR' USING >> org.apache.pig.piggybank.storage.avro.AvroStorage(); >> >> If I use the 1.5.4 jar I get the same error, but the script works with the >> 1.4.1 version. If I just write one field, then it works with 1.6.0. >> >> I see there's been a related issue fixed here: >> >> https://issues.apache.org/jira/browse/PIG-2202 >> https://issues.apache.org/jira/browse/PIG-2195 >> >> Can anyone confirm that this or similar works with avro 1.6.0, and/or point >> me in the right direction concering where the problem may lie? >> >> Many thanks, >> >> Andrew +
Russell Jurney 2012-01-09, 21:21
-
Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0Bill Graham 2012-01-10, 00:42
I'd be cautious of using AvroStorage in it's current state with 1.6.0.
Running the piggybank unit tests against 1.6.0 causes compile failures, due to non-backward compatible Avro changes in 1.6.0. GenericDatumReader.newRecord(Object old, Schema schema) has gone away in Avro 1.6.0. [javac] /Users/billg/ws/git/pig/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroDatumReader.java:136: method does not override or implement a method from a supertype [javac] @Override [javac] ^ I've just created this FYI: https://issues.apache.org/jira/browse/PIG-2463 On Mon, Jan 9, 2012 at 1:21 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > Avro 1.4.1 only works for me with PIG-2411 applied > > Russell Jurney > twitter.com/rjurney > [EMAIL PROTECTED] > datasyndrome.com > > On Jan 9, 2012, at 12:52 PM, Stan Rosenberg > <[EMAIL PROTECTED]> wrote: > > > Generally, AvroStorage works fine for us with Avro 1.6. However, we > > also patched AvroStorage on a couple of occasions, e.g., see PIG-2330. > > > > stan > > > > On Mon, Jan 9, 2012 at 3:47 PM, Russell Jurney <[EMAIL PROTECTED]> > wrote: > >> I could only make AvroStorage work with Avro 1.4.1. > >> > >> Russell Jurney > >> twitter.com/rjurney > >> [EMAIL PROTECTED] > >> datasyndrome.com > >> > >> On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <[EMAIL PROTECTED]> > wrote: > >> > >> Hallo, > >> > >> When I run a simple pig script to LOAD and STORE avro data, I get:- > >> > >> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot > be > >> cast to org.apache.avro.generic.IndexedRecord > >> > >> Script: > >> > >> REGISTER /tmp/avro-1.6.0.jar; > >> --REGISTER /tmp/avro-1.5.4.jar > >> --REGISTER /tmp/avro-1.4.1.jar; > >> > >> REGISTER /tmp/piggybank-0.9.1.jar; > >> REGISTER /tmp/json-simple-1.1.jar; > >> REGISTER /tmp/jackson-core-asl-1.8.4.jar; > >> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; > >> > >> avroData=LOAD '$DATA_INPUTDIR' USING > >> org.apache.pig.piggybank.storage.avro.AvroStorage(); > >> > >> dataSubset = FOREACH avroData GENERATE myField1, myField2; > >> describe dataSubset; > >> ----------------------------------------------- > >> -- shows: > >> -- dataSubset : { myField1: int, myField2: int} > >> ----------------------------------------------- > >> STORE dataSubset INTO '$OUTPUTDIR' USING > >> org.apache.pig.piggybank.storage.avro.AvroStorage(); > >> > >> If I use the 1.5.4 jar I get the same error, but the script works with > the > >> 1.4.1 version. If I just write one field, then it works with 1.6.0. > >> > >> I see there's been a related issue fixed here: > >> > >> https://issues.apache.org/jira/browse/PIG-2202 > >> https://issues.apache.org/jira/browse/PIG-2195 > >> > >> Can anyone confirm that this or similar works with avro 1.6.0, and/or > point > >> me in the right direction concering where the problem may lie? > >> > >> Many thanks, > >> > >> Andrew > +
Bill Graham 2012-01-10, 00:42
|