Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Jython UDF problem


Copy link to this message
-
Re: Jython UDF problem
Seems like a bug in jython:
>>> import time
>>> tuple_time = time.strptime('2006-10-16T08:19:39', "%Y-%m-%dT%H:%M:%S")
>>> tuple_time.tm_hour
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'tm_hour'
>>> tuple_time[3]
8

Change return str(tuple_time.tm_hour) into return str(tuple_time[3])
seems fix the issue.

Daniel

On Sun, Feb 5, 2012 at 12:44 AM, Aniket Mokashi <[EMAIL PROTECTED]> wrote:
> Looks like this is jython bug.
>
> Btw, afaik, the return type of this function would be a bytearray if
> decorator is not specified.
>
> Thanks,
> Aniket
>
> On Sat, Feb 4, 2012 at 9:39 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> Why am I having tuple objects in my python udfs?  This isn't how the
>> examples work.
>>
>> Error:
>>
>> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error
>> executing function
>> at
>>
>> org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:106)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:275)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:320)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> Caused by: Traceback (most recent call last):
>>  File "udfs.py", line 27, in hour
>>    return tuple_time.tm_hour
>> AttributeError: 'tuple' object has no attribute 'tm_hour'
>>
>>
>> udfs.py:
>>
>> #!/usr/bin/python
>>
>> import time
>>
>> def hour(iso_string):
>>  tuple_time = time.strptime(iso_string, "%Y-%m-%dT%H:%M:%S")
>>  return str(tuple_time.tm_hour)
>>
>>
>> my.pig:
>>
>> register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar
>> register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar
>> register /me/pig/contrib/piggybank/java/piggybank.jar
>> register /me/pig/build/ivy/lib/Pig/jackson-core-asl-1.7.3.jar
>> register /me/pig/build/ivy/lib/Pig/jackson-mapper-asl-1.7.3.jar
>>
>> define AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>> define CustomFormatToISO
>> org.apache.pig.piggybank.evaluation.datetime.convert.CustomFormatToISO();
>> define substr org.apache.pig.piggybank.evaluation.string.SUBSTRING();
>>
>> register 'udfs.py' using jython as agiledata;
>>
>> rmf /tmp/sent_distribution.txt
>>
>> /* Get email address pairs for each type of connection, and union them
>> together */
>> emails = load '/me/tmp/test_inbox' using AvroStorage();
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB