Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Nested Split


The only operators that are supported inside foreach (for now) are:

Filter
Distinct
Sort
Limit

Currently, there will be four pipelines inside the foreach to execute
your statement with filters, one for projecting the group and the
remaining for each of the COUNTs. x1 will be read three times.

Thanks,
Santhosh

-----Original Message-----
From: Tamir Kamara [mailto:[EMAIL PROTECTED]]
Sent: Sunday, July 19, 2009 12:47 AM
To: [EMAIL PROTECTED]
Subject: Nested Split

Hi,

The following script gives an error because split cannot be used in
nested
statements:
x1 = load 'file' as (a, b, c);
x2 = group x1 by a;
x3 = foreach x2 {
split x1 into y1 if b==1 and c==1, y2 if b==2 and c==2, y3 if b==3 and
c==3;
generate group, COUNT(y1), COUNT(y2), COUNT(y3);
}

This forces the definition of y1, y2, y3 on separate statements with
filter.

Does this mean that x1 will be scanned 3 times?
Shouldn't split work in the nested case also?

Thanks,
Tamir
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB