-Pig streaming and multiquery is buggy on local mode ?
Thomas Porez 2013-07-11, 13:08
I realize today a strange behavior of PIG in local mode (streaming +
I put here a minimal script to reproduce the problem.
Suppose an input file with multiple lines for example:
The pig cript is :
MyInput = LOAD 'myInput;
A = myInput GROUP BY $ 0;
B = FOREACH A GENERATE FLATTEN (myInput);
C = B STREAM THROUGH `cat`;
D = myInput GROUP BY $ 0;
E = FOREACH D GENERATE FLATTEN (myInput);
STREAM THROUGH E F = `cat`;
STORE C into 'output1;
STORE F into 'output2;
I run the script using the following command:
pig -x local bug.pig
We should find in output1 and output2 perfect copy of my input file ... but
this is not the case. We find only one line (the first line of the file)
output1/part cat *
output2/part cat *
For information, it seems that the script pig hadoop corresponding work
If I comment one of the two store operation, it works as expected (i think
it's because on multiquery is run).