Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig streaming and multiquery is buggy on local mode ?


+
Thomas Porez 2013-07-11, 13:08
Copy link to this message
-
Re: Pig streaming and multiquery is buggy on local mode ?
Thomas Porez 2013-07-11, 14:18
It seems that the script is not correct, some operator have been
inverted... So the correct version is

# bug.pig
MYINPUT = LOAD 'myinput';

A = GROUP MYINPUT BY $0;
B = FOREACH A GENERATE FLATTEN(MYINPUT);
C = STREAM B THROUGH `ruby script.rb`;

D = GROUP MYINPUT BY $0;
E = FOREACH D GENERATE FLATTEN(MYINPUT);
F = STREAM E THROUGH `ruby script.rb`;

STORE C into 'output1';
STORE F into 'output2';

# I run the script using the following command:
pig -x local bug.pig

# And show the output
cat output1/part*
cat output2/part*
2013/7/11 Thomas Porez <[EMAIL PROTECTED]>

> I realize today a strange behavior of PIG in local mode (streaming +
> multiquery).
> I put here a minimal script to reproduce the problem.
>
> Suppose an input file with multiple lines for example:
> # myInput
> 1
> 2
> 3
> 1
> 2
> 3
>
> The pig cript is :
> # bug.pig
> MyInput = LOAD 'myInput;
>
> A = myInput GROUP BY $ 0;
> B = FOREACH A GENERATE FLATTEN (myInput);
> C = B STREAM THROUGH `cat`;
>
> D = myInput GROUP BY $ 0;
> E = FOREACH D GENERATE FLATTEN (myInput);
> STREAM THROUGH E F = `cat`;
>
> STORE C into 'output1;
> STORE F into 'output2;
>
> I run the script using the following command:
> pig -x local bug.pig
>
> We should find in output1 and output2 perfect copy of my input file ...
> but this is not the case. We find only one line (the first line of the file)
> output1/part cat *
> output2/part cat *
>
> For information, it seems that the script pig hadoop corresponding work
> properly.
> If I comment one of the two store operation, it works as expected (i think
> it's because on multiquery is run).
>
+
Thomas Porez 2013-07-11, 14:48