I have a simple join question.
base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2);
stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
joined = JOIN base BY id1, stats BY id1;
final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
STORE final INTO 'output' USING PigStorage( ',' );
But something doesnt feels right.
Inputs are of order MB's.. whereas outputs are like 100GB's...
I tried it on sample file
where base is 35MB
stats is 10MB
and output explodes to GB's??
What am i missing?