Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Bug when COUNTing bag of tuple ?


Copy link to this message
-
Bug when COUNTing bag of tuple ?
Hello,

I think there is a bug in PIG when using COUNT on Bag of Tuple with empty
element. Here is a minimal script to reproduce this bug :

I've this CSV file :
,a
1,a
2,a
,a
3,b
4,b
5,b

I use that script :
test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS
(key:chararray, value:chararray);
test = GROUP test BY value;
DUMP test;
test = FOREACH test GENERATE group, COUNT(test);
DUMP test;

And the output is :
(a,{(,a),(1,a),(2,a),(,a)})
(b,{(3,b),(4,b),(5,b)})
(a,2)
(b,3)

Does it seem to be normal ? I was expecting to :
(a,{(,a),(1,a),(2,a),(,a)})
(b,{(3,b),(4,b),(5,b)})
(a,*4*)
(b,3)

Regards,

Kevin Lion
Capptain.com - Pilot your Apps
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB