Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - FILTER and fields from tuple/bags


Copy link to this message
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-13, 22:41
This is my pig script so far that gives me output. What I want to do is
arrange them in this order NC,28613,55 from below output.

My question is from this relation how can I extract specific fields from
bags and tuples? Essentially I want to do something like:

foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want
fields in this order from a given relation. But the problem is it's
arranged in a bag and multiple tuples
(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12
11:36:25)
{(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/
03/12
11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9
99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
11
:36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)}
{(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
11:36:25,AGE,55),(1333477861077/home/hadoo
p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
11:36:25,OCCUP,xxxxxxx
xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201
1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)}

snippet of the script

D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN';
NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR
FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR
FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1';
NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as
A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as
A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as
A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE;
AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as
B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as B_F_ID_ROOT,
CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as
B_FIELD_VALUE;
A_JOIN = JOIN NM_CT_ST_FIELDS BY
(A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY
(FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT);
B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY
(FILE_NAME,CREATED_DATE);
A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID,
A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE;
B_JOIN_F = FOREACH B_JOIN GENERATE
B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE;
FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY
(B_FILE_NAME,B_CREATED_DATE);
FINAL_DISTINCT = DISTINCT FINAL;
On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> It's not clear to me what exactly you are trying to accomplish. Could you
> provide some sample inputs and expected outputs?
>
> You can use filter inside a foreach:
>
> Foreach foo { a = filter bag_in_foo by condition; generate a; }
>
> On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
>
> > I am new to pig and I have gone through the reference. I am getting used
> to
> > how this works but I keep getting questions as I write my scripts. I have
> > couple of questions:
> >
> > i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE
> > into one row. But in the end I get all the fields in form of row which
> > seems to have Bags inside tuples. In the end all I want is to output
> values
> > of some of the fields from each row in "a,b,c" format. How can I do that?
> >
> >
> > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY'
> OR
> > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
> >
> > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP'
> OR
> > FIELD_ID == 'MARITAL') AND FORM_ID == 'FPERSWKS' AND FORM_COPY_NUM => '1';