Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Join question


Copy link to this message
-
Join question
jamal sasha 2013-04-01, 21:06
Hi,
  I have a simple join question.
base = load 'input1'   USING PigStorage( ',' ) as (id1, field1, field2);
stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
joined = JOIN base BY  id1, stats BY id1;
final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
stats::mean,stats::median;
STORE final INTO   'output'   USING PigStorage( ',' );

But something doesnt feels right.
Inputs are of order MB's.. whereas outputs are like 100GB's...

I tried it on sample file
where base is 35MB
stats is 10MB
and output explodes to GB's??
What am i missing?
+
Mehmet Tepedelenlioglu 2013-04-01, 21:24
+
jamal sasha 2013-04-01, 22:44
+
Mehmet Tepedelenlioglu 2013-04-02, 01:20
+
F. Jerrell Schivers 2013-09-04, 23:39