jamal sasha 2013-04-01, 21:06
Are your ids unique?
On 4/1/13 2:06 PM, "jamal sasha" <[EMAIL PROTECTED]> wrote:
> I have a simple join question.
>base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2);
>stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
>joined = JOIN base BY id1, stats BY id1;
>final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
>STORE final INTO 'output' USING PigStorage( ',' );
>But something doesnt feels right.
>Inputs are of order MB's.. whereas outputs are like 100GB's...
>I tried it on sample file
>where base is 35MB
>stats is 10MB
>and output explodes to GB's??
>What am i missing?
jamal sasha 2013-04-01, 22:44
Mehmet Tepedelenlioglu 2013-04-02, 01:20
F. Jerrell Schivers 2013-09-04, 23:39