Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Problem with my pig script


+
Sameer Tilak 2013-10-29, 21:47
Copy link to this message
-
Re: Problem with my pig script
Hi Sameer,

Can you replace "DUMP X" with "STORE X into /scratch/X" and retry?
I believe multi-query optimization of pig only works for "STORE" and DUMP is executed as an independent query.

Besides from that, having randomness in pig/mapreduce code is always tricky.
Any mappers can be retried after providing output to subset of reducers.  
So if you have randomness like this, you always risk of having inconsistency in the result.
(I don't think you're hitting this though.)

Koji
On Oct 29, 2013, at 5:47 PM, Sameer Tilak <[EMAIL PROTECTED]> wrote:

> Hello Pig experts,
>
> I have the following simple script. For simplicity, I have replaced my UDF with this dummy UDF that shows the problem that I am having. UDF TupleTest generates a tuple in the following manner:
>
> boolean randomboolean = rngen.nextBoolean();
>
>               if(randomboolean)
>                   {
>                       output.set(0, 1);
>                       output.set(1, "Black");
>                   }
>               else
>                   {
>                       output.set(0, 0);
>                       output.set(1, "White");
>                   }
>
>
> Pig script:
>
> REGISTER /N/u/sameer/software/pig-0.11.1/myudfs.jar
>
> DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
>
> A = LOAD '/scratch/file.seq' USING SequenceFileLoader AS (key: chararray, value: chararray);
>
> AU = FOREACH A GENERATE FLATTEN(myudfs.TupleTest(key, value)) AS (randbool: int, randstr: chararray);
> STORE AU into '/scratch/AU';
>
> B = GROUP AU BY randbool;
> STORE B into '/scratch/B';
>
> X = FOREACH B GENERATE group, COUNT(AU);
> DUMP X;
>
>
> Here is the sample o/p:
>
> hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/AU/part-m-00000
> Warning: $HADOOP_HOME is deprecated.
>
> 1    Black
> 1    Black
> 0    White
> 1    Black
>
> hadoop --config $HADOOP_CONF_DIR fs -cat /scratch/B/part-r-00000
> Warning: $HADOOP_HOME is deprecated.
>
> 0    {(0,White)}
> 1    {(1,Black),(1,Black),(1,Black)}
>
> X:
> (0,2)
> (1,2)
>
> As you can see, X is wrong, it should be: (0,1), (1,3). Can you please help me with this?
>
>