Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Regex expression in FOREACH


+
praveenesh kumar 2012-02-10, 11:22
+
Grig Gheorghiu 2012-02-10, 18:08
Copy link to this message
-
Re: Regex expression in FOREACH
No, this is not what I was asking for -
I mean Suppose I have columns names like :

1. Name
2. Update1
3. Update50
4. Update100
5. Total
6. Description

I want to generate all those columns that start with Update ?

If I have small number of columns, I can do this by eyeballing. But if I
have like 100 columns, Its kind of difficult.
In HIVE we can do this, so as in SQL. I want to know is it possible in PIG
also , generating columns using some kind of regex ?
Thanks,
Praveenesh

On Fri, Feb 10, 2012 at 11:38 PM, Grig Gheorghiu
<[EMAIL PROTECTED]>wrote:

> You can use EXTRACT.
>
> REGISTER file:/home/hadoop/lib/pig/piggybank.jar;
> DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT();
>
> Assume relation A contains tuples with a field called key of the form:
>
> id=123232|val=asdsa|
>
> Then you can extract the id field like this:
>
> B = FOREACH A GENERATE
>        FLATTEN(
>                EXTRACT(key, 'id=([^\\|]+)[\\|]*')
>        )
>        AS (
>                id: chararray
> );
>
> Note that each backslash needs to be escaped, hence the \\.
>
> HTH,
>
> Grig
> On Fri, Feb 10, 2012 at 3:22 AM, praveenesh kumar <[EMAIL PROTECTED]>
> wrote:
> > Is it possible to specify regex expressions in FOREACH statement to
> > generate only selected columns as specified by the regex ?
> >
> > Suppose I want to generate only those columns that ends with 'XYZ'  , Is
> it
> > possible to do in Pig using some regex?
> >
> > Thanks,
> > Praveenesh
>
+
Alan Gates 2012-02-11, 18:28
+
praveenesh kumar 2012-02-11, 17:19
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB