|
|
+
Vincent BARAT 2009-10-15, 12:51
+
Dmitriy Ryaboy 2009-10-15, 13:09
-
Re: Possible bug in NULL fields handlingVincent BARAT 2009-10-15, 13:40
Thank you very much for your answer!
I was not aware about the COUNT_STAR() function. I guess it has been introduced recently (otherwise it is a bug in the documentation :-) Anyway, the end proposal in PIG-1014 seems ok to me. At least, I think that the current behavior of the COUNT when applied on bags is misleading. Dmitriy Ryaboy a �crit : > Currently, COUNT of a bag will ignore bags which have the first field > as null (this stems from the fact that COUNT of a column will count > non-null columns, for sql compatibility). You may want to try using > COUNT_STAR. This behavior is currently being reconsidered: > https://issues.apache.org/jira/browse/PIG-1014 (please provide input!) > > -Dmitriy > > On Thu, Oct 15, 2009 at 8:51 AM, Vincent BARAT <[EMAIL PROTECTED]> wrote: >> Hello, >> >> I'm not sure if it's a bug, but the handling of NULL fields seems not to >> work correctly: >> >> My data (events): >> >> 0,,jawi >> ,0,juug >> ,,lfou >> 0,0,caro >> >> My script: >> >> events = load 'events' using PigStorage(',') AS (sessionid:chararray, >> jobid:chararray, user:chararray); >> user_events = group events by user; >> dump user_events; >> event_count_by_user = foreach user_events generate group, COUNT(events); >> dump event_count_by_user; >> >> The results: >> >> user_events (correct): >> (caro,{(0,0,caro)}) >> (jawi,{(0,,jawi)}) >> (juug,{(,0,juug)}) >> (lfou,{(,,lfou)}) >> >> event_count_by_user (incorrect): >> (caro,1L) >> (jawi,1L) >> (juug,0L) >> (lfou,0L) >> >> event_count_by_user should be: >> >> (caro,1L) >> (jawi,1L) >> (juug,1L) >> (lfou,1L) >> >> It seems that tuples starting with (, are not counted correctly. >> >> Any suggestion? >> >> Thanks a lot >> >> >> > > |