Vincent BARAT 2009-10-15, 12:51
Dmitriy Ryaboy 2009-10-15, 13:09
Thank you very much for your answer!
I was not aware about the COUNT_STAR() function.
I guess it has been introduced recently (otherwise it is a bug in
the documentation :-)
Anyway, the end proposal in PIG-1014 seems ok to me. At least, I
think that the current behavior of the COUNT when applied on bags is
Dmitriy Ryaboy a ï¿½crit :
> Currently, COUNT of a bag will ignore bags which have the first field
> as null (this stems from the fact that COUNT of a column will count
> non-null columns, for sql compatibility). You may want to try using
> COUNT_STAR. This behavior is currently being reconsidered:
> https://issues.apache.org/jira/browse/PIG-1014 (please provide input!)
> On Thu, Oct 15, 2009 at 8:51 AM, Vincent BARAT <[EMAIL PROTECTED]> wrote:
>> I'm not sure if it's a bug, but the handling of NULL fields seems not to
>> work correctly:
>> My data (events):
>> My script:
>> events = load 'events' using PigStorage(',') AS (sessionid:chararray,
>> jobid:chararray, user:chararray);
>> user_events = group events by user;
>> dump user_events;
>> event_count_by_user = foreach user_events generate group, COUNT(events);
>> dump event_count_by_user;
>> The results:
>> user_events (correct):
>> event_count_by_user (incorrect):
>> event_count_by_user should be:
>> It seems that tuples starting with (, are not counted correctly.
>> Any suggestion?
>> Thanks a lot