Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - FOREACH GENERATE Conditional?


+
Eli Finkelshteyn 2012-10-24, 06:51
+
David LaBarbera 2012-10-24, 13:57
Copy link to this message
-
Re: FOREACH GENERATE Conditional?
Eli Finkelshteyn 2012-10-24, 07:08
Sorry, forgot to mention: I know I can use a UDF for this, but was wondering if/hoping there's a pure PIG approach.

Eli

On Oct 23, 2012, at 11:51 PM, Eli Finkelshteyn wrote:

> Hi folks,
> I have a pig script that right now looks like this:
>
> …
> likes = FILTER main_set BY blah == 'a' AND meh == 'b';
> likes_time = FOREACH likes GENERATE date, 'likes' AS type;
>
> dislikes = FILTER main_set BY blah == 'b' AND meh == 'c';
> dislikes_time = FOREACH dislikes GENERATE date, 'dislikes' AS type;
>
> newuserregs = FILTER main_set BY blah == 'c' AND meh == 'd';
> newuserregs_time = FOREACH dislikes GENERATE date, 'newuserregs' as type;
> ...
>
> all_time = UNION likes_time, dislikes_time, newuserregs_time;
> …
>
> As you can see, what I'm doing is filtering the main_set repeatedly and generating based on that, and then unioning everything back together. This means a lot of extra map jobs, which is a lot of extra work. Really, thinking about it in terms of mapping, I should be able to do things in one run. Any idea what the pig syntax would be for that? Is there something like a GENERATE conditional, where I could do something like:
>
> all_time = FOREACH main_set GENERATE date, 'likes' IF (blah == 'a' AND meh == 'b')
>  'dislikes' IF (blah == 'b' AND meh == 'c')
>  'dislikes' IF (blah == 'c' AND meh == 'd') AS type;
>
> Running this in just one map job would be very awesome and would speed this script up a ton, I'm thinking. Ideas? Advice?
>
> Eli
+
Alan Gates 2012-10-24, 17:21