|
|
-
Re: FOREACH GENERATE Conditional?Alan Gates 2012-10-24, 17:21
Are you sure Pig is spawning extra map jobs for this? The multi-query optimizer should be pushing these back together into one job.
If it isn't, you should be able to accomplish the same thing with trinary logic and a single filter: all = foreach main_set ((blah == 'a' and meh == 'b') ? 'likes' : ((blah == 'b' and meh == 'c') ? 'disklikes' : ((blah == 'c' and meh =='d') ? 'newuserregs' : ''))) as type; all_time = filter all by type != ''; (Not sure about all the parenthesis placement, as I didn't run it.) Alan. On Oct 24, 2012, at 2:51 AM, Eli Finkelshteyn wrote: > Hi folks, > I have a pig script that right now looks like this: > > … > likes = FILTER main_set BY blah == 'a' AND meh == 'b'; > likes_time = FOREACH likes GENERATE date, 'likes' AS type; > > dislikes = FILTER main_set BY blah == 'b' AND meh == 'c'; > dislikes_time = FOREACH dislikes GENERATE date, 'dislikes' AS type; > > newuserregs = FILTER main_set BY blah == 'c' AND meh == 'd'; > newuserregs_time = FOREACH dislikes GENERATE date, 'newuserregs' as type; > ... > > all_time = UNION likes_time, dislikes_time, newuserregs_time; > … > > As you can see, what I'm doing is filtering the main_set repeatedly and generating based on that, and then unioning everything back together. This means a lot of extra map jobs, which is a lot of extra work. Really, thinking about it in terms of mapping, I should be able to do things in one run. Any idea what the pig syntax would be for that? Is there something like a GENERATE conditional, where I could do something like: > > all_time = FOREACH main_set GENERATE date, 'likes' IF (blah == 'a' AND meh == 'b') > 'dislikes' IF (blah == 'b' AND meh == 'c') > 'dislikes' IF (blah == 'c' AND meh == 'd') AS type; > > Running this in just one map job would be very awesome and would speed this script up a ton, I'm thinking. Ideas? Advice? > > Eli |