Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig Conditionals (Do I have to use UDFs)?


Copy link to this message
-
Re: Pig Conditionals (Do I have to use UDFs)?
Eli Finkelshteyn 2011-09-14, 21:53
Ah, neat! That would do the trick. Seems like a lot of extra steps, but
I'll take it if that's how it's done in PIG. Thanks!

On 9/14/11 5:51 PM, Ryan Hoegg wrote:
> What about trying something with SPLIT and UNION:
>
> SPLIT EXAMPLE_SOURCE INTO GOOD IF number>5, BETTER IF (number>=2 AND
> number<=4), BEST IF (number>=5);
>
> I did a few FOREACH and a UNION, and got this:
> (a,6,best)
> (b,5,best)
> (d,8,best)
> (a,6,good)
> (d,8,good)
> (a,2,better)
> (b,2,better)
> (c,3,better)
> (d,3,better)
> (d,4,better)
>
> --
> Ryan Hoegg
>
> On Wed, Sep 14, 2011 at 4:24 PM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote:
>
>> Sorry, bad example, I guess. I want something I can do case statements
>> with. In this case I could map instead, but if I wanted to use less
>> straight-forward cases (i.e. one case where number == 1, another where
>> number between 2 and 4, another where number greater than 5, etc...), it
>> would be much more difficult to do with mapping.
>>
>> Again, I know this is something I can do with udfs, but it seemed like
>> something light enough to be built into PIG itself, so I was hoping there
>> was a way to do it without needing to write a udf every time I have a new
>> transformation to make.
>>
>> Eli
>>
>> On 9/14/11 5:07 PM, Ryan Hoegg wrote:
>>
>>> What about putting the mappings into their own relation?  I tried this
>>> with
>>> 0.9.0:
>>>
>>> example.txt:
>>> a,1
>>> a,2
>>> b,2
>>> c,1
>>> d,3
>>> d,4
>>>
>>> mapping.txt:
>>> 1,one
>>> 2,two
>>> 3,three
>>> 4,four
>>>
>>> MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS
>>> (number:int,name:chararray);
>>> EXAMPLE_SOURCE = LOAD 'example.txt' USING PigStorage(',') AS
>>> (item:chararray,number:int);
>>> MAPPED = JOIN EXAMPLE_SOURCE BY number LEFT OUTER, MAPPINGS BY number;
>>> PRETTY = FOREACH MAPPED GENERATE item, name;
>>> DUMP PRETTY;
>>> (a,one)
>>> (c,one)
>>> (a,two)
>>> (b,two)
>>> (d,three)
>>> (d,four)
>>>
>>> --
>>> Ryan Hoegg
>>>
>>> On Wed, Sep 14, 2011 at 3:27 PM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>> wrote:
>>>   Hi,
>>>> I'd like to generate based on exclusive conditions (something like the
>>>> CASE
>>>> statement in SQL). An example:
>>>>
>>>> Say I have data that looks like:
>>>>
>>>> (a, 1)
>>>> (a, 2)
>>>> (b, 2)
>>>> (c, 1)
>>>> (d, 3)
>>>> (d, 4)
>>>>
>>>> And I want to just convert each of the numbers to their written forms to
>>>> get:
>>>>
>>>> (a, one)
>>>> (a, two)
>>>> (b, two)
>>>> (c, one)
>>>> (d, three)
>>>> (d, four)
>>>>
>>>> Would I need to write a udf for that, or is there some simple way to do
>>>> it
>>>> using cases? I know I can do a bunch of bidirectional generates one on
>>>> top
>>>> of the other to achieve this, like:
>>>>
>>>> FOREACH rel GENERATE $0, (($1==1) ? 'one' : (($1 == 2) ? 'two' : (($1 =>>>> 3)
>>>> ? 'three' : 'four')));
>>>>
>>>> but that seems too messy. I'd appreciate any advice.
>>>>
>>>> Thanks!
>>>> Eli
>>>>
>>>>
>>>>
>>>>