Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FILTER and fields from tuple/bags


Copy link to this message
-
Re: FILTER and fields from tuple/bags
This is the output I am expecting

NC,28613,55

On Fri, Apr 13, 2012 at 4:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Could you type out the actual results you would like to see? I am not
> sure what you expect the results of " foreach rel GENERATE FIELD => ST, FIELD == ZIP, FIELD == AGE" to look like.
>
> Also, your Pig scripts do not have to be all caps. Using something
> other than all caps will make then a lot more readable...
>
> D
>
> On Fri, Apr 13, 2012 at 3:41 PM, Mohit Anchlia <[EMAIL PROTECTED]>
> wrote:
> > This is my pig script so far that gives me output. What I want to do is
> > arrange them in this order NC,28613,55 from below output.
> >
> > My question is from this relation how can I extract specific fields from
> > bags and tuples? Essentially I want to do something like:
> >
> > foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want
> > fields in this order from a given relation. But the problem is it's
> > arranged in a bag and multiple tuples
> >
> >
> >
> (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12
> > 11:36:25)
> >
> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/
> > 03/12
> >
> 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
> > 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9
> > 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
> >
> 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
> > 11
> > :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)}
> >
> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
> > 11:36:25,AGE,55),(1333477861077/home/hadoo
> > p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
> > 11:36:25,OCCUP,xxxxxxx
> >
> xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201
> > 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)}
> >
> > snippet of the script
> >
> > D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN';
> > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY'
> OR
> > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
> > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP'
> OR
> > FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1';
> > NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as
> > A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as
> > A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as
> > A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE;
> > AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as
> > B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as
> B_F_ID_ROOT,
> > CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as
> > B_FIELD_VALUE;
> > A_JOIN = JOIN NM_CT_ST_FIELDS BY
> > (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY
> > (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT);
> > B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY
> > (FILE_NAME,CREATED_DATE);
> > A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID,
> > A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE;
> > B_JOIN_F = FOREACH B_JOIN GENERATE
> >
> B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE;
> > FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY
> > (B_FILE_NAME,B_CREATED_DATE);
> > FINAL_DISTINCT = DISTINCT FINAL;
> >
> >
> > On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> wrote:
> >
> >> It's not clear to me what exactly you are trying to accomplish. Could
> you
> >> provide some sample inputs and expected outputs?
> >>
> >> You can use filter inside a foreach:
> >>
> >> Foreach foo { a = filter bag_in_foo by condition; generate a; }
> >>
> >> On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB