|
|
-
Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Russell Jurney 2012-02-02, 22:21
Correction: when I read the file in Python, I get the error below. It looks like a unicode problem? Can one tell Avro how to handle this?
Traceback (most recent call last): File "./cat_avro", line 21, in <module> for record in df_reader: File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", line 354, in next datum = self.datum_reader.read(self.datum_decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 445, in read return self.read_data(self.writers_schema, self.readers_schema, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 490, in read_data return self.read_record(writers_schema, readers_schema, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 690, in read_record field_val = self.read_data(field.type, readers_field.type, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 488, in read_data return self.read_union(writers_schema, readers_schema, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 654, in read_union return self.read_data(selected_writers_schema, readers_schema, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 458, in read_data return self.read_data(writers_schema, s, decoder) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 468, in read_data return decoder.read_utf8() File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 233, in read_utf8 return unicode(self.read_bytes(), "utf-8") UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543: invalid start byte On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
> I am writing Avro records in Ruby using the avro ruby gem in 1.8.7. I > have problems with loading these files sometimes. As a result, I am unable > to write large files that are readable. > > The exception I get is below. Anyone have an idea what this means? It > looks like Avro is having trouble parsing the schema. The avro files parse > in Ruby and Python, just not Pig. Are there more rigorous checks in Java? > > Pig Stack Trace > --------------- > ERROR 2998: Unhandled internal error. > org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory; > > java.lang.NoSuchMethodError: > org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory; > at org.apache.avro.Schema.<clinit>(Schema.java:82) > at > org.apache.pig.piggybank.storage.avro.AvroStorageUtils.<clinit>(AvroStorageUtils.java:49) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:163) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getAvroSchema(AvroStorage.java:144) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.getSchema(AvroStorage.java:269) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:150) > at > org.apache.pig.newplan.logical.relational.LOLoad.getSchema(LOLoad.java:109) > at > org.apache.pig.newplan.logical.visitor.LineageFindRelVisitor.visit(LineageFindRelVisitor.java:100) > at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
James Baldassari 2012-02-02, 22:41
HI Russell,
I'm not sure about the Python error, but the Java error looks like a classpath problem, not a schema parsing issue. The NoSuchMethodError in the stack trace indicates that Avro was trying to invoke a method in the Jackson library that wasn't present at run-time. My guess is that your program (or Pig?) either has two incompatible versions of the Jackson library on its classpath or maybe Avro's Jackson dependency has been excluded and a version that is incompatible with Avro is on the classpath.
Which version of Avro is being used? Running 'mvn dependency:tree' in Avro trunk I see that it's depending on Jackson 1.8.6. Can you verify that only one version of Jackson is on the classpath and that it's the version that is required by whatever version of Avro is on the classpath?
-James On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
> Correction: when I read the file in Python, I get the error below. It > looks like a unicode problem? Can one tell Avro how to handle this? > > Traceback (most recent call last): > File "./cat_avro", line 21, in <module> > for record in df_reader: > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", > line 354, in next > datum = self.datum_reader.read(self.datum_decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 445, in read > return self.read_data(self.writers_schema, self.readers_schema, > decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 490, in read_data > return self.read_record(writers_schema, readers_schema, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 690, in read_record > field_val = self.read_data(field.type, readers_field.type, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 488, in read_data > return self.read_union(writers_schema, readers_schema, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 654, in read_union > return self.read_data(selected_writers_schema, readers_schema, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 458, in read_data > return self.read_data(writers_schema, s, decoder) > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 468, in read_data > return decoder.read_utf8() > File > "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", > line 233, in read_utf8 > return unicode(self.read_bytes(), "utf-8") > UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543: > invalid start byte > > > On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > >> I am writing Avro records in Ruby using the avro ruby gem in 1.8.7. I >> have problems with loading these files sometimes. As a result, I am unable >> to write large files that are readable. >> >> The exception I get is below. Anyone have an idea what this means? It >> looks like Avro is having trouble parsing the schema. The avro files parse >> in Ruby and Python, just not Pig. Are there more rigorous checks in Java? >> >> Pig Stack Trace >> --------------- >> ERROR 2998: Unhandled internal error. >> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;
-
Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Russell Jurney 2012-02-02, 22:48
The jars being used are:
REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar REGISTER /me/pig/contrib/piggybank/java/piggybank.jar REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari <[EMAIL PROTECTED]>wrote:
> HI Russell, > > I'm not sure about the Python error, but the Java error looks like a > classpath problem, not a schema parsing issue. The NoSuchMethodError in > the stack trace indicates that Avro was trying to invoke a method in the > Jackson library that wasn't present at run-time. My guess is that your > program (or Pig?) either has two incompatible versions of the Jackson > library on its classpath or maybe Avro's Jackson dependency has been > excluded and a version that is incompatible with Avro is on the classpath. > > Which version of Avro is being used? Running 'mvn dependency:tree' in > Avro trunk I see that it's depending on Jackson 1.8.6. Can you verify that > only one version of Jackson is on the classpath and that it's the version > that is required by whatever version of Avro is on the classpath? > > -James > > > > On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > >> Correction: when I read the file in Python, I get the error below. It >> looks like a unicode problem? Can one tell Avro how to handle this? >> >> Traceback (most recent call last): >> File "./cat_avro", line 21, in <module> >> for record in df_reader: >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", >> line 354, in next >> datum = self.datum_reader.read(self.datum_decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 445, in read >> return self.read_data(self.writers_schema, self.readers_schema, >> decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 490, in read_data >> return self.read_record(writers_schema, readers_schema, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 690, in read_record >> field_val = self.read_data(field.type, readers_field.type, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 488, in read_data >> return self.read_union(writers_schema, readers_schema, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 654, in read_union >> return self.read_data(selected_writers_schema, readers_schema, >> decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 458, in read_data >> return self.read_data(writers_schema, s, decoder) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 468, in read_data >> return decoder.read_utf8() >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >> line 233, in read_utf8 >> return unicode(self.read_bytes(), "utf-8") >> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543: >> invalid start byte >> >> >> On Thu, Feb 2, 2012 at 2:06 PM, Russell Jurney <[EMAIL PROTECTED] >> > wrote: >> >>> I am writing Avro records in Ruby using the avro ruby gem in 1.8.7. I >>> have problems with loading these files sometimes. As a result, I am unable Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Russell Jurney 2012-02-02, 22:49
A little bit more searching shows this: http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > The jars being used are: > > REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar > REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar > REGISTER /me/pig/contrib/piggybank/java/piggybank.jar > REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar > REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar > > On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari <[EMAIL PROTECTED]>wrote: > >> HI Russell, >> >> I'm not sure about the Python error, but the Java error looks like a >> classpath problem, not a schema parsing issue. The NoSuchMethodError in >> the stack trace indicates that Avro was trying to invoke a method in the >> Jackson library that wasn't present at run-time. My guess is that your >> program (or Pig?) either has two incompatible versions of the Jackson >> library on its classpath or maybe Avro's Jackson dependency has been >> excluded and a version that is incompatible with Avro is on the classpath. >> >> Which version of Avro is being used? Running 'mvn dependency:tree' in >> Avro trunk I see that it's depending on Jackson 1.8.6. Can you verify that >> only one version of Jackson is on the classpath and that it's the version >> that is required by whatever version of Avro is on the classpath? >> >> -James >> >> >> >> On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: >> >>> Correction: when I read the file in Python, I get the error below. It >>> looks like a unicode problem? Can one tell Avro how to handle this? >>> >>> Traceback (most recent call last): >>> File "./cat_avro", line 21, in <module> >>> for record in df_reader: >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", >>> line 354, in next >>> datum = self.datum_reader.read(self.datum_decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 445, in read >>> return self.read_data(self.writers_schema, self.readers_schema, >>> decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 490, in read_data >>> return self.read_record(writers_schema, readers_schema, decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 690, in read_record >>> field_val = self.read_data(field.type, readers_field.type, decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 488, in read_data >>> return self.read_union(writers_schema, readers_schema, decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 654, in read_union >>> return self.read_data(selected_writers_schema, readers_schema, >>> decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 458, in read_data >>> return self.read_data(writers_schema, s, decoder) >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 468, in read_data >>> return decoder.read_utf8() >>> File >>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>> line 233, in read_utf8 >>> return unicode(self.read_bytes(), "utf-8") >>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543: Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
-
Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Russell Jurney 2012-02-02, 22:53
Further examination shows that the problematic emails I am encoding are formatted in ISO-8859-1, not UTF-8. That is why I am getting character problems. Looks like it is not an Avro problem after all. Thanks! :) On Thu, Feb 2, 2012 at 2:49 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > A little bit more searching shows this: > > > http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/> > > On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > >> The jars being used are: >> >> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar >> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar >> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar >> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar >> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar >> >> On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari <[EMAIL PROTECTED]>wrote: >> >>> HI Russell, >>> >>> I'm not sure about the Python error, but the Java error looks like a >>> classpath problem, not a schema parsing issue. The NoSuchMethodError in >>> the stack trace indicates that Avro was trying to invoke a method in the >>> Jackson library that wasn't present at run-time. My guess is that your >>> program (or Pig?) either has two incompatible versions of the Jackson >>> library on its classpath or maybe Avro's Jackson dependency has been >>> excluded and a version that is incompatible with Avro is on the classpath. >>> >>> Which version of Avro is being used? Running 'mvn dependency:tree' in >>> Avro trunk I see that it's depending on Jackson 1.8.6. Can you verify that >>> only one version of Jackson is on the classpath and that it's the version >>> that is required by whatever version of Avro is on the classpath? >>> >>> -James >>> >>> >>> >>> On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <[EMAIL PROTECTED] >>> > wrote: >>> >>>> Correction: when I read the file in Python, I get the error below. It >>>> looks like a unicode problem? Can one tell Avro how to handle this? >>>> >>>> Traceback (most recent call last): >>>> File "./cat_avro", line 21, in <module> >>>> for record in df_reader: >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", >>>> line 354, in next >>>> datum = self.datum_reader.read(self.datum_decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>>> line 445, in read >>>> return self.read_data(self.writers_schema, self.readers_schema, >>>> decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>>> line 490, in read_data >>>> return self.read_record(writers_schema, readers_schema, decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>>> line 690, in read_record >>>> field_val = self.read_data(field.type, readers_field.type, decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>>> line 488, in read_data >>>> return self.read_union(writers_schema, readers_schema, decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>>> line 654, in read_union >>>> return self.read_data(selected_writers_schema, readers_schema, >>>> decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", >>>> line 458, in read_data >>>> return self.read_data(writers_schema, s, decoder) >>>> File >>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
|
|