Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FILTER and fields from tuple/bags


Copy link to this message
-
Re: FILTER and fields from tuple/bags
This is my pig script so far that gives me output. What I want to do is
arrange them in this order NC,28613,55 from below output.

My question is from this relation how can I extract specific fields from
bags and tuples? Essentially I want to do something like:

foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want
fields in this order from a given relation. But the problem is it's
arranged in a bag and multiple tuples
(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12
11:36:25)
{(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/
03/12
11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9
99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
11
:36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)}
{(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
11:36:25,AGE,55),(1333477861077/home/hadoo
p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
11:36:25,OCCUP,xxxxxxx
xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201
1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)}

snippet of the script

D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN';
NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR
FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR
FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1';
NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as
A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as
A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as
A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE;
AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as
B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as B_F_ID_ROOT,
CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as
B_FIELD_VALUE;
A_JOIN = JOIN NM_CT_ST_FIELDS BY
(A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY
(FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT);
B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY
(FILE_NAME,CREATED_DATE);
A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID,
A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE;
B_JOIN_F = FOREACH B_JOIN GENERATE
B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE;
FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY
(B_FILE_NAME,B_CREATED_DATE);
FINAL_DISTINCT = DISTINCT FINAL;
On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> It's not clear to me what exactly you are trying to accomplish. Could you
> provide some sample inputs and expected outputs?
>
> You can use filter inside a foreach:
>
> Foreach foo { a = filter bag_in_foo by condition; generate a; }
>
> On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>
> > I am new to pig and I have gone through the reference. I am getting used
> to
> > how this works but I keep getting questions as I write my scripts. I have
> > couple of questions:
> >
> > i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE
> > into one row. But in the end I get all the fields in form of row which
> > seems to have Bags inside tuples. In the end all I want is to output
> values
> > of some of the fields from each row in "a,b,c" format. How can I do that?
> >
> >
> > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY'
> OR
> > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
> >
> > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP'
> OR
> > FIELD_ID == 'MARITAL') AND FORM_ID == 'FPERSWKS' AND FORM_COPY_NUM => '1';
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB