Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig Conditionals (Do I have to use UDFs)?


Copy link to this message
-
Re: Pig Conditionals (Do I have to use UDFs)?
Sorry, bad example, I guess. I want something I can do case statements
with. In this case I could map instead, but if I wanted to use less
straight-forward cases (i.e. one case where number == 1, another where
number between 2 and 4, another where number greater than 5, etc...), it
would be much more difficult to do with mapping.

Again, I know this is something I can do with udfs, but it seemed like
something light enough to be built into PIG itself, so I was hoping
there was a way to do it without needing to write a udf every time I
have a new transformation to make.

Eli

On 9/14/11 5:07 PM, Ryan Hoegg wrote:
> What about putting the mappings into their own relation?  I tried this with
> 0.9.0:
>
> example.txt:
> a,1
> a,2
> b,2
> c,1
> d,3
> d,4
>
> mapping.txt:
> 1,one
> 2,two
> 3,three
> 4,four
>
> MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS
> (number:int,name:chararray);
> EXAMPLE_SOURCE = LOAD 'example.txt' USING PigStorage(',') AS
> (item:chararray,number:int);
> MAPPED = JOIN EXAMPLE_SOURCE BY number LEFT OUTER, MAPPINGS BY number;
> PRETTY = FOREACH MAPPED GENERATE item, name;
> DUMP PRETTY;
> (a,one)
> (c,one)
> (a,two)
> (b,two)
> (d,three)
> (d,four)
>
> --
> Ryan Hoegg
>
> On Wed, Sep 14, 2011 at 3:27 PM, Eli Finkelshteyn<[EMAIL PROTECTED]>wrote:
>
>> Hi,
>> I'd like to generate based on exclusive conditions (something like the CASE
>> statement in SQL). An example:
>>
>> Say I have data that looks like:
>>
>> (a, 1)
>> (a, 2)
>> (b, 2)
>> (c, 1)
>> (d, 3)
>> (d, 4)
>>
>> And I want to just convert each of the numbers to their written forms to
>> get:
>>
>> (a, one)
>> (a, two)
>> (b, two)
>> (c, one)
>> (d, three)
>> (d, four)
>>
>> Would I need to write a udf for that, or is there some simple way to do it
>> using cases? I know I can do a bunch of bidirectional generates one on top
>> of the other to achieve this, like:
>>
>> FOREACH rel GENERATE $0, (($1==1) ? 'one' : (($1 == 2) ? 'two' : (($1 == 3)
>> ? 'three' : 'four')));
>>
>> but that seems too messy. I'd appreciate any advice.
>>
>> Thanks!
>> Eli
>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB