Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Regex expression in FOREACH


Copy link to this message
-
Re: Regex expression in FOREACH
Any info on this ? Its kind of urgent.

Thanks,
Praveenesh
On Sat, Feb 11, 2012 at 1:02 AM, Grig Gheorghiu <[EMAIL PROTECTED]>wrote:

> Ah OK I get it....but I don't know the answer. Hopefully somebody on
> the list will reply, it's an interesting problem.
>
> On Fri, Feb 10, 2012 at 11:30 AM, praveenesh kumar <[EMAIL PROTECTED]>
> wrote:
> > No, this is not what I was asking for -
> > I mean Suppose I have columns names like :
> >
> > 1. Name
> > 2. Update1
> > 3. Update50
> > 4. Update100
> > 5. Total
> > 6. Description
> >
> > I want to generate all those columns that start with Update ?
> >
> > If I have small number of columns, I can do this by eyeballing. But if I
> > have like 100 columns, Its kind of difficult.
> > In HIVE we can do this, so as in SQL. I want to know is it possible in
> PIG
> > also , generating columns using some kind of regex ?
> >
> >
> > Thanks,
> > Praveenesh
> >
> >
> > On Fri, Feb 10, 2012 at 11:38 PM, Grig Gheorghiu <
> [EMAIL PROTECTED]>
> > wrote:
> >>
> >> You can use EXTRACT.
> >>
> >> REGISTER file:/home/hadoop/lib/pig/piggybank.jar;
> >> DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
> >>
> >> Assume relation A contains tuples with a field called key of the form:
> >>
> >> id=123232|val=asdsa|
> >>
> >> Then you can extract the id field like this:
> >>
> >> B = FOREACH A GENERATE
> >>        FLATTEN(
> >>                EXTRACT(key, 'id=([^\\|]+)[\\|]*')
> >>        )
> >>        AS (
> >>                id: chararray
> >> );
> >>
> >> Note that each backslash needs to be escaped, hence the \\.
> >>
> >> HTH,
> >>
> >> Grig
> >> On Fri, Feb 10, 2012 at 3:22 AM, praveenesh kumar <[EMAIL PROTECTED]
> >
> >> wrote:
> >> > Is it possible to specify regex expressions in FOREACH statement to
> >> > generate only selected columns as specified by the regex ?
> >> >
> >> > Suppose I want to generate only those columns that ends with 'XYZ'  ,
> Is
> >> > it
> >> > possible to do in Pig using some regex?
> >> >
> >> > Thanks,
> >> > Praveenesh
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB