Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Regex expression in FOREACH


Copy link to this message
-
Re: Regex expression in FOREACH
praveenesh kumar 2012-02-10, 19:30
No, this is not what I was asking for -
I mean Suppose I have columns names like :

1. Name
2. Update1
3. Update50
4. Update100
5. Total
6. Description

I want to generate all those columns that start with Update ?

If I have small number of columns, I can do this by eyeballing. But if I
have like 100 columns, Its kind of difficult.
In HIVE we can do this, so as in SQL. I want to know is it possible in PIG
also , generating columns using some kind of regex ?
Thanks,
Praveenesh

On Fri, Feb 10, 2012 at 11:38 PM, Grig Gheorghiu
<[EMAIL PROTECTED]>wrote:

> You can use EXTRACT.
>
> REGISTER file:/home/hadoop/lib/pig/piggybank.jar;
> DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
>
> Assume relation A contains tuples with a field called key of the form:
>
> id=123232|val=asdsa|
>
> Then you can extract the id field like this:
>
> B = FOREACH A GENERATE
>        FLATTEN(
>                EXTRACT(key, 'id=([^\\|]+)[\\|]*')
>        )
>        AS (
>                id: chararray
> );
>
> Note that each backslash needs to be escaped, hence the \\.
>
> HTH,
>
> Grig
> On Fri, Feb 10, 2012 at 3:22 AM, praveenesh kumar <[EMAIL PROTECTED]>
> wrote:
> > Is it possible to specify regex expressions in FOREACH statement to
> > generate only selected columns as specified by the regex ?
> >
> > Suppose I want to generate only those columns that ends with 'XYZ'  , Is
> it
> > possible to do in Pig using some regex?
> >
> > Thanks,
> > Praveenesh
>