Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Possible bug in NULL fields handling


Copy link to this message
-
Possible bug in NULL fields handling
Hello,

I'm not sure if it's a bug, but the handling of NULL fields seems
not to work correctly:

My data (events):

0,,jawi
,0,juug
,,lfou
0,0,caro

My script:

events = load 'events' using PigStorage(',') AS
(sessionid:chararray, jobid:chararray, user:chararray);
user_events = group events by user;
dump user_events;
event_count_by_user = foreach user_events generate group, COUNT(events);
dump event_count_by_user;

The results:

user_events (correct):
(caro,{(0,0,caro)})
(jawi,{(0,,jawi)})
(juug,{(,0,juug)})
(lfou,{(,,lfou)})

event_count_by_user (incorrect):
(caro,1L)
(jawi,1L)
(juug,0L)
(lfou,0L)

event_count_by_user should be:

(caro,1L)
(jawi,1L)
(juug,1L)
(lfou,1L)

It seems that tuples starting with (, are not counted correctly.

Any suggestion?

Thanks a lot
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB