Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - FILTER and fields from tuple/bags


Copy link to this message
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-13, 23:36
This is the output I am expecting

NC,28613,55

On Fri, Apr 13, 2012 at 4:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Could you type out the actual results you would like to see? I am not
> sure what you expect the results of " foreach rel GENERATE FIELD => ST, FIELD == ZIP, FIELD == AGE" to look like.
>
> Also, your Pig scripts do not have to be all caps. Using something
> other than all caps will make then a lot more readable...
>
> D
>
> On Fri, Apr 13, 2012 at 3:41 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > This is my pig script so far that gives me output. What I want to do is
> > arrange them in this order NC,28613,55 from below output.
> >
> > My question is from this relation how can I extract specific fields from
> > bags and tuples? Essentially I want to do something like:
> >
> > foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want
> > fields in this order from a given relation. But the problem is it's
> > arranged in a bag and multiple tuples
> >
> >
> >
> (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12
> > 11:36:25)
> >
> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/
> > 03/12
> >
> 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
> > 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9
> > 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
> >
> 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
> > 11
> > :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)}
> >
> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
> > 11:36:25,AGE,55),(1333477861077/home/hadoo
> > p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
> > 11:36:25,OCCUP,xxxxxxx
> >
> xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201
> > 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)}
> >
> > snippet of the script
> >
> > D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN';
> > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY'
> OR
> > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
> > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP'
> OR
> > FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1';
> > NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as
> > A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as
> > A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as
> > A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE;
> > AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as
> > B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as
> B_F_ID_ROOT,
> > CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as
> > B_FIELD_VALUE;
> > A_JOIN = JOIN NM_CT_ST_FIELDS BY
> > (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY
> > (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT);
> > B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY
> > (FILE_NAME,CREATED_DATE);
> > A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID,
> > A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE;
> > B_JOIN_F = FOREACH B_JOIN GENERATE
> >
> B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE;
> > FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY
> > (B_FILE_NAME,B_CREATED_DATE);
> > FINAL_DISTINCT = DISTINCT FINAL;
> >
> >
> > On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
> >
> >> It's not clear to me what exactly you are trying to accomplish. Could
> you
> >> provide some sample inputs and expected outputs?
> >>
> >> You can use filter inside a foreach:
> >>
> >> Foreach foo { a = filter bag_in_foo by condition; generate a; }
> >>
> >> On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]>