Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Bug when COUNTing bag of tuple ?


Copy link to this message
-
Bug when COUNTing bag of tuple ?
Kevin Lion 2012-03-08, 16:55
Hello,

I think there is a bug in PIG when using COUNT on Bag of Tuple with empty
element. Here is a minimal script to reproduce this bug :

I've this CSV file :
,a
1,a
2,a
,a
3,b
4,b
5,b

I use that script :
test = LOAD 'test.csv' USING org.apache.pig.builtin.PigStorage(',') AS
(key:chararray, value:chararray);
test = GROUP test BY value;
DUMP test;
test = FOREACH test GENERATE group, COUNT(test);
DUMP test;

And the output is :
(a,{(,a),(1,a),(2,a),(,a)})
(b,{(3,b),(4,b),(5,b)})
(a,2)
(b,3)

Does it seem to be normal ? I was expecting to :
(a,{(,a),(1,a),(2,a),(,a)})
(b,{(3,b),(4,b),(5,b)})
(a,*4*)
(b,3)

Regards,

Kevin Lion
Capptain.com - Pilot your Apps