Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Conditionals (Do I have to use UDFs)?


Copy link to this message
-
Re: Pig Conditionals (Do I have to use UDFs)?
Ah, neat! That would do the trick. Seems like a lot of extra steps, but
I'll take it if that's how it's done in PIG. Thanks!

On 9/14/11 5:51 PM, Ryan Hoegg wrote:
> What about trying something with SPLIT and UNION:
>
> SPLIT EXAMPLE_SOURCE INTO GOOD IF number>5, BETTER IF (number>=2 AND
> number<=4), BEST IF (number>=5);
>
> I did a few FOREACH and a UNION, and got this:
> (a,6,best)
> (b,5,best)
> (d,8,best)
> (a,6,good)
> (d,8,good)
> (a,2,better)
> (b,2,better)
> (c,3,better)
> (d,3,better)
> (d,4,better)
>
> --
> Ryan Hoegg
>
> On Wed, Sep 14, 2011 at 4:24 PM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote:
>
>> Sorry, bad example, I guess. I want something I can do case statements
>> with. In this case I could map instead, but if I wanted to use less
>> straight-forward cases (i.e. one case where number == 1, another where
>> number between 2 and 4, another where number greater than 5, etc...), it
>> would be much more difficult to do with mapping.
>>
>> Again, I know this is something I can do with udfs, but it seemed like
>> something light enough to be built into PIG itself, so I was hoping there
>> was a way to do it without needing to write a udf every time I have a new
>> transformation to make.
>>
>> Eli
>>
>> On 9/14/11 5:07 PM, Ryan Hoegg wrote:
>>
>>> What about putting the mappings into their own relation?  I tried this
>>> with
>>> 0.9.0:
>>>
>>> example.txt:
>>> a,1
>>> a,2
>>> b,2
>>> c,1
>>> d,3
>>> d,4
>>>
>>> mapping.txt:
>>> 1,one
>>> 2,two
>>> 3,three
>>> 4,four
>>>
>>> MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS
>>> (number:int,name:chararray);
>>> EXAMPLE_SOURCE = LOAD 'example.txt' USING PigStorage(',') AS
>>> (item:chararray,number:int);
>>> MAPPED = JOIN EXAMPLE_SOURCE BY number LEFT OUTER, MAPPINGS BY number;
>>> PRETTY = FOREACH MAPPED GENERATE item, name;
>>> DUMP PRETTY;
>>> (a,one)
>>> (c,one)
>>> (a,two)
>>> (b,two)
>>> (d,three)
>>> (d,four)
>>>
>>> --
>>> Ryan Hoegg
>>>
>>> On Wed, Sep 14, 2011 at 3:27 PM, Eli Finkelshteyn<iefinkel@gmail.**com<[EMAIL PROTECTED]>
>>>> wrote:
>>>   Hi,
>>>> I'd like to generate based on exclusive conditions (something like the
>>>> CASE
>>>> statement in SQL). An example:
>>>>
>>>> Say I have data that looks like:
>>>>
>>>> (a, 1)
>>>> (a, 2)
>>>> (b, 2)
>>>> (c, 1)
>>>> (d, 3)
>>>> (d, 4)
>>>>
>>>> And I want to just convert each of the numbers to their written forms to
>>>> get:
>>>>
>>>> (a, one)
>>>> (a, two)
>>>> (b, two)
>>>> (c, one)
>>>> (d, three)
>>>> (d, four)
>>>>
>>>> Would I need to write a udf for that, or is there some simple way to do
>>>> it
>>>> using cases? I know I can do a bunch of bidirectional generates one on
>>>> top
>>>> of the other to achieve this, like:
>>>>
>>>> FOREACH rel GENERATE $0, (($1==1) ? 'one' : (($1 == 2) ? 'two' : (($1 =>>>> 3)
>>>> ? 'three' : 'four')));
>>>>
>>>> but that seems too messy. I'd appreciate any advice.
>>>>
>>>> Thanks!
>>>> Eli
>>>>
>>>>
>>>>
>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB