Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python


Copy link to this message
-
Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python
Russell Jurney 2012-02-02, 22:49
A little bit more searching shows this:

http://www.harshj.com/2010/04/25/writing-and-reading-avro-data-files-using-python/

On Thu, Feb 2, 2012 at 2:48 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> The jars being used are:
>
> REGISTER /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER /me/pig/contrib/piggybank/java/piggybank.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
> REGISTER /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>
> On Thu, Feb 2, 2012 at 2:41 PM, James Baldassari <[EMAIL PROTECTED]>wrote:
>
>> HI Russell,
>>
>> I'm not sure about the Python error, but the Java error looks like a
>> classpath problem, not a schema parsing issue.  The NoSuchMethodError in
>> the stack trace indicates that Avro was trying to invoke a method in the
>> Jackson library that wasn't present at run-time.  My guess is that your
>> program (or Pig?) either has two incompatible versions of the Jackson
>> library on its classpath or maybe Avro's Jackson dependency has been
>> excluded and a version that is incompatible with Avro is on the classpath.
>>
>> Which version of Avro is being used?  Running 'mvn dependency:tree' in
>> Avro trunk I see that it's depending on Jackson 1.8.6.  Can you verify that
>> only one version of Jackson is on the classpath and that it's the version
>> that is required by whatever version of Avro is on the classpath?
>>
>> -James
>>
>>
>>
>> On Thu, Feb 2, 2012 at 5:21 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>>
>>> Correction: when I read the file in Python, I get the error below.  It
>>> looks like a unicode problem?  Can one tell Avro how to handle this?
>>>
>>> Traceback (most recent call last):
>>>   File "./cat_avro", line 21, in <module>
>>>     for record in df_reader:
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py",
>>> line 354, in next
>>>     datum = self.datum_reader.read(self.datum_decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 445, in read
>>>     return self.read_data(self.writers_schema, self.readers_schema,
>>> decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 490, in read_data
>>>     return self.read_record(writers_schema, readers_schema, decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 690, in read_record
>>>     field_val = self.read_data(field.type, readers_field.type, decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 488, in read_data
>>>     return self.read_union(writers_schema, readers_schema, decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 654, in read_union
>>>     return self.read_data(selected_writers_schema, readers_schema,
>>> decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 458, in read_data
>>>     return self.read_data(writers_schema, s, decoder)
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 468, in read_data
>>>     return decoder.read_utf8()
>>>   File
>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py",
>>> line 233, in read_utf8
>>>     return unicode(self.read_bytes(), "utf-8")
>>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 543:

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com