Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Problem with my pig script


Copy link to this message
-
Problem with my pig script
Sameer Tilak 2013-10-29, 21:47
Hello Pig experts,

I have the following simple script. For simplicity, I have replaced my UDF with this dummy UDF that shows the problem that I am having. UDF TupleTest generates a tuple in the following manner:

 boolean randomboolean = rngen.nextBoolean();

               if(randomboolean)
                   {
                       output.set(0, 1);
                       output.set(1, "Black");
                   }
               else
                   {
                       output.set(0, 0);
                       output.set(1, "White");
                   }
Pig script:

REGISTER /N/u/sameer/software/pig-0.11.1/myudfs.jar

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, value: chararray);

AU = FOREACH A GENERATE FLATTEN(myudfs.TupleTest(key, value)) AS (randbool: int, randstr: chararray);
STORE AU into '/scratch/AU';

B = GROUP AU BY randbool;
STORE B into '/scratch/B';

X = FOREACH B GENERATE group, COUNT(AU);
DUMP X;
Here is the sample o/p:

hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-00000
Warning: $HADOOP_HOME is deprecated.

1    Black
1    Black
0    White
1    Black

hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/B/part-r-00000
Warning: $HADOOP_HOME is deprecated.

0    {(0,White)}
1    {(1,Black),(1,Black),(1,Black)}

X:
(0,2)
(1,2)

As you can see, X is wrong, it should be: (0,1), (1,3). Can you please help me with this?