Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Possible bug in NULL fields handling


Copy link to this message
-
Possible bug in NULL fields handling
Hello,

I'm not sure if it's a bug, but the handling of NULL fields seems
not to work correctly:

My data (events):

0,,jawi
,0,juug
,,lfou
0,0,caro

My script:

events = load 'events' using PigStorage(',') AS
(sessionid:chararray, jobid:chararray, user:chararray);
user_events = group events by user;
dump user_events;
event_count_by_user = foreach user_events generate group, COUNT(events);
dump event_count_by_user;

The results:

user_events (correct):
(caro,{(0,0,caro)})
(jawi,{(0,,jawi)})
(juug,{(,0,juug)})
(lfou,{(,,lfou)})

event_count_by_user (incorrect):
(caro,1L)
(jawi,1L)
(juug,0L)
(lfou,0L)

event_count_by_user should be:

(caro,1L)
(jawi,1L)
(juug,1L)
(lfou,1L)

It seems that tuples starting with (, are not counted correctly.

Any suggestion?

Thanks a lot
+
Dmitriy Ryaboy 2009-10-15, 13:09
+
Vincent BARAT 2009-10-15, 13:40
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB