Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: [jira] Commented: (PIG-1841) TupleSize implemented incorrectly


Copy link to this message
-
Re: [jira] Commented: (PIG-1841) TupleSize implemented incorrectly
Wait did he implement outputSchema?  According to the wiki without providing scheme pig assumes it's a single field of type byte array.

http://wiki.apache.org/pig/UDFManual

Is that the problem?

Sent from my iPhone

On Feb 9, 2011, at 7:39 PM, "Daniel Dai (JIRA)" <[EMAIL PROTECTED]> wrote:

>
>    [ https://issues.apache.org/jira/browse/PIG-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992838#comment-12992838 ]
>
> Daniel Dai commented on PIG-1841:
> ---------------------------------
>
> +1
>
>> TupleSize implemented incorrectly
>> ---------------------------------
>>
>>                Key: PIG-1841
>>                URL: https://issues.apache.org/jira/browse/PIG-1841
>>            Project: Pig
>>         Issue Type: Bug
>>   Affects Versions: 0.8.0
>>           Reporter: Eric Tschetter
>>           Assignee: Laukik Chitnis
>>            Fix For: 0.8.0, 0.9.0
>>
>>        Attachments: PIG-1841.patch
>>
>>
>> I sent this to the list:
>> I'm looking at Pig's TupleSize implementation and wondering if it's
>> implemented correctly:
>>   @Override
>>   public Long exec(Tuple input) throws IOException {
>>       try{
>>           if (input == null) return null;
>>           return Long.valueOf(input.size());
>>       }catch(Exception e){
>>           int errCode = 2106;
>>           String msg = "Error while computing size in " +
>> this.getClass().getSimpleName();
>>           throw new ExecException(msg, errCode, PigException.BUG,
>> e);
>>       }
>>   }
>> I have a script that looks like
>> A = FOREACH A GENERATE STRSPLIT(value, '\u0001') AS values;
>> B = FOREACH B GENERATE values, SIZE(values) AS cnt;
>> and cnt always ends up as 1.  From the code, it looks like TupleSize
>> is intended to only return the number of arguments into the SIZE()
>> UDF?  Is that really the intention and I'm using the SIZE() UDF wrong?
>> Or, is it just a bug and it's supposed to be written as "return
>> Long.valueOf(((Tuple) input.get(0)).size()))"?
>> I got this response back:
>> This is definitely a bug. Can you open a Jira ticket?
>> Done!
>
> --
> This message is automatically generated by JIRA.
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB