Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> FOREACH GENERATE Conditional?


Copy link to this message
-
FOREACH GENERATE Conditional?
Hi folks,
I have a pig script that right now looks like this:


likes = FILTER main_set BY blah == 'a' AND meh == 'b';
likes_time = FOREACH likes GENERATE date, 'likes' AS type;

dislikes = FILTER main_set BY blah == 'b' AND meh == 'c';
dislikes_time = FOREACH dislikes GENERATE date, 'dislikes' AS type;

newuserregs = FILTER main_set BY blah == 'c' AND meh == 'd';
newuserregs_time = FOREACH dislikes GENERATE date, 'newuserregs' as type;
...

all_time = UNION likes_time, dislikes_time, newuserregs_time;


As you can see, what I'm doing is filtering the main_set repeatedly and generating based on that, and then unioning everything back together. This means a lot of extra map jobs, which is a lot of extra work. Really, thinking about it in terms of mapping, I should be able to do things in one run. Any idea what the pig syntax would be for that? Is there something like a GENERATE conditional, where I could do something like:

all_time = FOREACH main_set GENERATE date, 'likes' IF (blah == 'a' AND meh == 'b')
 'dislikes' IF (blah == 'b' AND meh == 'c')
 'dislikes' IF (blah == 'c' AND meh == 'd') AS type;

Running this in just one map job would be very awesome and would speed this script up a ton, I'm thinking. Ideas? Advice?

Eli
+
David LaBarbera 2012-10-24, 13:57
+
Eli Finkelshteyn 2012-10-24, 07:08
+
Alan Gates 2012-10-24, 17:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB