Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Nested Split


Copy link to this message
-
RE: Nested Split
Santhosh Srinivasan 2009-07-19, 15:45
The only operators that are supported inside foreach (for now) are:

Filter
Distinct
Sort
Limit

Currently, there will be four pipelines inside the foreach to execute
your statement with filters, one for projecting the group and the
remaining for each of the COUNTs. x1 will be read three times.

Thanks,
Santhosh

-----Original Message-----
From: Tamir Kamara [mailto:[EMAIL PROTECTED]]
Sent: Sunday, July 19, 2009 12:47 AM
To: [EMAIL PROTECTED]
Subject: Nested Split

Hi,

The following script gives an error because split cannot be used in
nested
statements:
x1 = load 'file' as (a, b, c);
x2 = group x1 by a;
x3 = foreach x2 {
split x1 into y1 if b==1 and c==1, y2 if b==2 and c==2, y3 if b==3 and
c==3;
generate group, COUNT(y1), COUNT(y2), COUNT(y3);
}

This forces the definition of y1, y2, y3 on separate statements with
filter.

Does this mean that x1 will be scanned 3 times?
Shouldn't split work in the nested case also?

Thanks,
Tamir