Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Regex expression in FOREACH


+
praveenesh kumar 2012-02-10, 11:22
+
Grig Gheorghiu 2012-02-10, 18:08
+
praveenesh kumar 2012-02-10, 19:30
+
Alan Gates 2012-02-11, 18:28
Copy link to this message
-
Re: Regex expression in FOREACH
praveenesh kumar 2012-02-11, 17:19
Any info on this ? Its kind of urgent.

Thanks,
Praveenesh
On Sat, Feb 11, 2012 at 1:02 AM, Grig Gheorghiu <[EMAIL PROTECTED]>wrote:

> Ah OK I get it....but I don't know the answer. Hopefully somebody on
> the list will reply, it's an interesting problem.
>
> On Fri, Feb 10, 2012 at 11:30 AM, praveenesh kumar <[EMAIL PROTECTED]>
> wrote:
> > No, this is not what I was asking for -
> > I mean Suppose I have columns names like :
> >
> > 1. Name
> > 2. Update1
> > 3. Update50
> > 4. Update100
> > 5. Total
> > 6. Description
> >
> > I want to generate all those columns that start with Update ?
> >
> > If I have small number of columns, I can do this by eyeballing. But if I
> > have like 100 columns, Its kind of difficult.
> > In HIVE we can do this, so as in SQL. I want to know is it possible in
> PIG
> > also , generating columns using some kind of regex ?
> >
> >
> > Thanks,
> > Praveenesh
> >
> >
> > On Fri, Feb 10, 2012 at 11:38 PM, Grig Gheorghiu <
> [EMAIL PROTECTED]>
> > wrote:
> >>
> >> You can use EXTRACT.
> >>
> >> REGISTER file:/home/hadoop/lib/pig/piggybank.jar;
> >> DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
> >>
> >> Assume relation A contains tuples with a field called key of the form:
> >>
> >> id=123232|val=asdsa|
> >>
> >> Then you can extract the id field like this:
> >>
> >> B = FOREACH A GENERATE
> >>        FLATTEN(
> >>                EXTRACT(key, 'id=([^\\|]+)[\\|]*')
> >>        )
> >>        AS (
> >>                id: chararray
> >> );
> >>
> >> Note that each backslash needs to be escaped, hence the \\.
> >>
> >> HTH,
> >>
> >> Grig
> >> On Fri, Feb 10, 2012 at 3:22 AM, praveenesh kumar <[EMAIL PROTECTED]
> >
> >> wrote:
> >> > Is it possible to specify regex expressions in FOREACH statement to
> >> > generate only selected columns as specified by the regex ?
> >> >
> >> > Suppose I want to generate only those columns that ends with 'XYZ'  ,
> Is
> >> > it
> >> > possible to do in Pig using some regex?
> >> >
> >> > Thanks,
> >> > Praveenesh
> >
> >
>