Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: [jira] Commented: (PIG-1841) TupleSize implemented incorrectly


Copy link to this message
-
Re: [jira] Commented: (PIG-1841) TupleSize implemented incorrectly
Wait did he implement outputSchema?  According to the wiki without providing scheme pig assumes it's a single field of type byte array.

http://wiki.apache.org/pig/UDFManual

Is that the problem?

Sent from my iPhone

On Feb 9, 2011, at 7:39 PM, "Daniel Dai (JIRA)" <[EMAIL PROTECTED]> wrote:

>
>    [ https://issues.apache.org/jira/browse/PIG-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992838#comment-12992838 ]
>
> Daniel Dai commented on PIG-1841:
> ---------------------------------
>
> +1
>
>> TupleSize implemented incorrectly
>> ---------------------------------
>>
>>                Key: PIG-1841
>>                URL: https://issues.apache.org/jira/browse/PIG-1841
>>            Project: Pig
>>         Issue Type: Bug
>>   Affects Versions: 0.8.0
>>           Reporter: Eric Tschetter
>>           Assignee: Laukik Chitnis
>>            Fix For: 0.8.0, 0.9.0
>>
>>        Attachments: PIG-1841.patch
>>
>>
>> I sent this to the list:
>> I'm looking at Pig's TupleSize implementation and wondering if it's
>> implemented correctly:
>>   @Override
>>   public Long exec(Tuple input) throws IOException {
>>       try{
>>           if (input == null) return null;
>>           return Long.valueOf(input.size());
>>       }catch(Exception e){
>>           int errCode = 2106;
>>           String msg = "Error while computing size in " +
>> this.getClass().getSimpleName();
>>           throw new ExecException(msg, errCode, PigException.BUG,
>> e);
>>       }
>>   }
>> I have a script that looks like
>> A = FOREACH A GENERATE STRSPLIT(value, '\u0001') AS values;
>> B = FOREACH B GENERATE values, SIZE(values) AS cnt;
>> and cnt always ends up as 1.  From the code, it looks like TupleSize
>> is intended to only return the number of arguments into the SIZE()
>> UDF?  Is that really the intention and I'm using the SIZE() UDF wrong?
>> Or, is it just a bug and it's supposed to be written as "return
>> Long.valueOf(((Tuple) input.get(0)).size()))"?
>> I got this response back:
>> This is definitely a bug. Can you open a Jira ticket?
>> Done!
>
> --
> This message is automatically generated by JIRA.
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>