Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Problem with my pig script


Copy link to this message
-
Problem with my pig script
Hello Pig experts,

I have the following simple script. For simplicity, I have replaced my UDF with this dummy UDF that shows the problem that I am having. UDF TupleTest generates a tuple in the following manner:

 boolean randomboolean = rngen.nextBoolean();

               if(randomboolean)
                   {
                       output.set(0, 1);
                       output.set(1, "Black");
                   }
               else
                   {
                       output.set(0, 0);
                       output.set(1, "White");
                   }
Pig script:

REGISTER /N/u/sameer/software/pig-0.11.1/myudfs.jar

DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();

A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, value: chararray);

AU = FOREACH A GENERATE FLATTEN(myudfs.TupleTest(key, value)) AS (randbool: int, randstr: chararray);
STORE AU into '/scratch/AU';

B = GROUP AU BY randbool;
STORE B into '/scratch/B';

X = FOREACH B GENERATE group, COUNT(AU);
DUMP X;
Here is the sample o/p:

hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-00000
Warning: $HADOOP_HOME is deprecated.

1    Black
1    Black
0    White
1    Black

hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/B/part-r-00000
Warning: $HADOOP_HOME is deprecated.

0    {(0,White)}
1    {(1,Black),(1,Black),(1,Black)}

X:
(0,2)
(1,2)

As you can see, X is wrong, it should be: (0,1), (1,3). Can you please help me with this?

     
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB