Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Possible bug in NULL fields handling


Copy link to this message
-
Re: Possible bug in NULL fields handling
Thank you very much for your answer!
I was not aware about the COUNT_STAR() function.
I guess it has been introduced recently (otherwise it is a bug in
the documentation :-)

Anyway, the end proposal in PIG-1014 seems ok to me. At least, I
think that the current behavior of the COUNT when applied on bags is
misleading.

Dmitriy Ryaboy a �crit :
> Currently, COUNT of a bag will ignore bags which have the first field
> as null (this stems from the fact that COUNT of a column will count
> non-null columns, for sql compatibility). You may want to try using
> COUNT_STAR. This behavior is currently being reconsidered:
> https://issues.apache.org/jira/browse/PIG-1014 (please provide input!)
>
> -Dmitriy
>
> On Thu, Oct 15, 2009 at 8:51 AM, Vincent BARAT <[EMAIL PROTECTED]> wrote:
>> Hello,
>>
>> I'm not sure if it's a bug, but the handling of NULL fields seems not to
>> work correctly:
>>
>> My data (events):
>>
>> 0,,jawi
>> ,0,juug
>> ,,lfou
>> 0,0,caro
>>
>> My script:
>>
>> events = load 'events' using PigStorage(',') AS (sessionid:chararray,
>> jobid:chararray, user:chararray);
>> user_events = group events by user;
>> dump user_events;
>> event_count_by_user = foreach user_events generate group, COUNT(events);
>> dump event_count_by_user;
>>
>> The results:
>>
>> user_events (correct):
>> (caro,{(0,0,caro)})
>> (jawi,{(0,,jawi)})
>> (juug,{(,0,juug)})
>> (lfou,{(,,lfou)})
>>
>> event_count_by_user (incorrect):
>> (caro,1L)
>> (jawi,1L)
>> (juug,0L)
>> (lfou,0L)
>>
>> event_count_by_user should be:
>>
>> (caro,1L)
>> (jawi,1L)
>> (juug,1L)
>> (lfou,1L)
>>
>> It seems that tuples starting with (, are not counted correctly.
>>
>> Any suggestion?
>>
>> Thanks a lot
>>
>>
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB