|
|
-
FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-12, 00:27
I am new to pig and I have gone through the reference. I am getting used to how this works but I keep getting questions as I write my scripts. I have couple of questions:
i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE into one row. But in the end I get all the fields in form of row which seems to have Bags inside tuples. In the end all I want is to output values of some of the fields from each row in "a,b,c" format. How can I do that? NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR FIELD_ID == 'MARITAL') AND FORM_ID == 'FPERSWKS' AND FORM_COPY_NUM == '1';
NM_CT_ST = JOIN NM_CT_ST_FILTER BY (FILE_NAME,CREATED_DATE), D BY (FILE_NAME,CREATED_DATE);
AG_OC_MT = JOIN AG_OC_MT_FILTER BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), D BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT);
FINAL = COGROUP NM_CT_ST BY (D::FILE_NAME,D::CREATED_DATE), AG_OC_MT BY (D::FILE_NAME,D::CREATED_DATE);
2) Is it possible to use FILTER with foreach? something like foreach A GENERATE B FILTER FIELD BY .. OR FIELD BY ..
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-12, 21:14
Could someone please help me answer below questions?
On Wed, Apr 11, 2012 at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> > I am new to pig and I have gone through the reference. I am getting used > to how this works but I keep getting questions as I write my scripts. I > have couple of questions: > > i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE > into one row. But in the end I get all the fields in form of row which > seems to have Bags inside tuples. In the end all I want is to output values > of some of the fields from each row in "a,b,c" format. How can I do that? > > > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); > > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR > FIELD_ID == 'MARITAL') AND FORM_ID == 'FPERSWKS' AND FORM_COPY_NUM == '1'; > > NM_CT_ST = JOIN NM_CT_ST_FILTER BY (FILE_NAME,CREATED_DATE), D BY > (FILE_NAME,CREATED_DATE); > > AG_OC_MT = JOIN AG_OC_MT_FILTER BY > (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), D BY > (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT); > > FINAL = COGROUP NM_CT_ST BY (D::FILE_NAME,D::CREATED_DATE), AG_OC_MT BY > (D::FILE_NAME,D::CREATED_DATE); > > 2) Is it possible to use FILTER with foreach? something like foreach A > GENERATE B FILTER FIELD BY .. OR FIELD BY .. >
-
Re: FILTER and fields from tuple/bags
Dmitriy Ryaboy 2012-04-13, 02:37
It's not clear to me what exactly you are trying to accomplish. Could you provide some sample inputs and expected outputs?
You can use filter inside a foreach:
Foreach foo { a = filter bag_in_foo by condition; generate a; }
On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> I am new to pig and I have gone through the reference. I am getting used to > how this works but I keep getting questions as I write my scripts. I have > couple of questions: > > i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE > into one row. But in the end I get all the fields in form of row which > seems to have Bags inside tuples. In the end all I want is to output values > of some of the fields from each row in "a,b,c" format. How can I do that? > > > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); > > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR > FIELD_ID == 'MARITAL') AND FORM_ID == 'FPERSWKS' AND FORM_COPY_NUM == '1'; > > NM_CT_ST = JOIN NM_CT_ST_FILTER BY (FILE_NAME,CREATED_DATE), D BY > (FILE_NAME,CREATED_DATE); > > AG_OC_MT = JOIN AG_OC_MT_FILTER BY > (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), D BY > (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT); > > FINAL = COGROUP NM_CT_ST BY (D::FILE_NAME,D::CREATED_DATE), AG_OC_MT BY > (D::FILE_NAME,D::CREATED_DATE); > > 2) Is it possible to use FILTER with foreach? something like foreach A > GENERATE B FILTER FIELD BY .. OR FIELD BY ..
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-13, 22:41
This is my pig script so far that gives me output. What I want to do is arrange them in this order NC,28613,55 from below output.
My question is from this relation how can I extract specific fields from bags and tuples? Essentially I want to do something like:
foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want fields in this order from a given relation. But the problem is it's arranged in a bag and multiple tuples (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12 11:36:25) {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/ 03/12 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 11 :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 11:36:25,AGE,55),(1333477861077/home/hadoo p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 11:36:25,OCCUP,xxxxxxx xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)}
snippet of the script
D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN'; NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1'; NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE; AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as B_F_ID_ROOT, CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as B_FIELD_VALUE; A_JOIN = JOIN NM_CT_ST_FIELDS BY (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT); B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY (FILE_NAME,CREATED_DATE); A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID, A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE; B_JOIN_F = FOREACH B_JOIN GENERATE B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE; FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY (B_FILE_NAME,B_CREATED_DATE); FINAL_DISTINCT = DISTINCT FINAL; On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> It's not clear to me what exactly you are trying to accomplish. Could you > provide some sample inputs and expected outputs? > > You can use filter inside a foreach: > > Foreach foo { a = filter bag_in_foo by condition; generate a; } > > On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > > > I am new to pig and I have gone through the reference. I am getting used > to > > how this works but I keep getting questions as I write my scripts. I have > > couple of questions: > > > > i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE > > into one row. But in the end I get all the fields in form of row which > > seems to have Bags inside tuples. In the end all I want is to output > values > > of some of the fields from each row in "a,b,c" format. How can I do that? > > > > > > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' > OR > > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); > > > > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' > OR > > FIELD_ID == 'MARITAL') AND FORM_ID == 'FPERSWKS' AND FORM_COPY_NUM => '1';
-
Re: FILTER and fields from tuple/bags
Dmitriy Ryaboy 2012-04-13, 23:28
Could you type out the actual results you would like to see? I am not sure what you expect the results of " foreach rel GENERATE FIELD =ST, FIELD == ZIP, FIELD == AGE" to look like.
Also, your Pig scripts do not have to be all caps. Using something other than all caps will make then a lot more readable...
D
On Fri, Apr 13, 2012 at 3:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > This is my pig script so far that gives me output. What I want to do is > arrange them in this order NC,28613,55 from below output. > > My question is from this relation how can I extract specific fields from > bags and tuples? Essentially I want to do something like: > > foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want > fields in this order from a given relation. But the problem is it's > arranged in a bag and multiple tuples > > > (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12 > 11:36:25) > {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/ > 03/12 > 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 > 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9 > 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 > 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 > 11 > :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)} > {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 > 11:36:25,AGE,55),(1333477861077/home/hadoo > p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 > 11:36:25,OCCUP,xxxxxxx > xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201 > 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)} > > snippet of the script > > D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN'; > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' OR > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' OR > FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1'; > NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as > A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as > A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as > A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE; > AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as > B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as B_F_ID_ROOT, > CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as > B_FIELD_VALUE; > A_JOIN = JOIN NM_CT_ST_FIELDS BY > (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY > (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT); > B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY > (FILE_NAME,CREATED_DATE); > A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID, > A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE; > B_JOIN_F = FOREACH B_JOIN GENERATE > B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE; > FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY > (B_FILE_NAME,B_CREATED_DATE); > FINAL_DISTINCT = DISTINCT FINAL; > > > On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote: > >> It's not clear to me what exactly you are trying to accomplish. Could you >> provide some sample inputs and expected outputs? >> >> You can use filter inside a foreach: >> >> Foreach foo { a = filter bag_in_foo by condition; generate a; } >> >> On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >> >> > I am new to pig and I have gone through the reference. I am getting used >> to >> > how this works but I keep getting questions as I write my scripts. I have >> > couple of questions: >> > >> > i) I use FILTER with FOREACH? Below I am trying to FILTER, JOIN and MERGE >> > into one row. But in the end I get all the fields in form of row which
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-13, 23:36
This is the output I am expecting
NC,28613,55
On Fri, Apr 13, 2012 at 4:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Could you type out the actual results you would like to see? I am not > sure what you expect the results of " foreach rel GENERATE FIELD => ST, FIELD == ZIP, FIELD == AGE" to look like. > > Also, your Pig scripts do not have to be all caps. Using something > other than all caps will make then a lot more readable... > > D > > On Fri, Apr 13, 2012 at 3:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > This is my pig script so far that gives me output. What I want to do is > > arrange them in this order NC,28613,55 from below output. > > > > My question is from this relation how can I extract specific fields from > > bags and tuples? Essentially I want to do something like: > > > > foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want > > fields in this order from a given relation. But the problem is it's > > arranged in a bag and multiple tuples > > > > > > > (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12 > > 11:36:25) > > > {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/ > > 03/12 > > > 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 > > 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9 > > 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 > > > 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 > > 11 > > :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)} > > > {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 > > 11:36:25,AGE,55),(1333477861077/home/hadoo > > p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 > > 11:36:25,OCCUP,xxxxxxx > > > xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201 > > 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)} > > > > snippet of the script > > > > D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN'; > > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' > OR > > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); > > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' > OR > > FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1'; > > NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as > > A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as > > A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as > > A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE; > > AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as > > B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as > B_F_ID_ROOT, > > CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as > > B_FIELD_VALUE; > > A_JOIN = JOIN NM_CT_ST_FIELDS BY > > (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY > > (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT); > > B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY > > (FILE_NAME,CREATED_DATE); > > A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID, > > A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE; > > B_JOIN_F = FOREACH B_JOIN GENERATE > > > B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE; > > FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY > > (B_FILE_NAME,B_CREATED_DATE); > > FINAL_DISTINCT = DISTINCT FINAL; > > > > > > On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> > wrote: > > > >> It's not clear to me what exactly you are trying to accomplish. Could > you > >> provide some sample inputs and expected outputs? > >> > >> You can use filter inside a foreach: > >> > >> Foreach foo { a = filter bag_in_foo by condition; generate a; } > >> > >> On Apr 11, 2012, at 5:27 PM, Mohit Anchlia <[EMAIL PROTECTED]>
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-16, 14:57
Could someone help with this? Is the best way to use Flatten in this case? Or am I doing something entirely wrong.
On Fri, Apr 13, 2012 at 4:36 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> This is the output I am expecting > > NC,28613,55 > > On Fri, Apr 13, 2012 at 4:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote: > >> Could you type out the actual results you would like to see? I am not >> sure what you expect the results of " foreach rel GENERATE FIELD =>> ST, FIELD == ZIP, FIELD == AGE" to look like. >> >> Also, your Pig scripts do not have to be all caps. Using something >> other than all caps will make then a lot more readable... >> >> D >> >> On Fri, Apr 13, 2012 at 3:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >> > This is my pig script so far that gives me output. What I want to do is >> > arrange them in this order NC,28613,55 from below output. >> > >> > My question is from this relation how can I extract specific fields from >> > bags and tuples? Essentially I want to do something like: >> > >> > foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want >> > fields in this order from a given relation. But the problem is it's >> > arranged in a bag and multiple tuples >> > >> > >> > >> (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12 >> > 11:36:25) >> > >> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/ >> > 03/12 >> > >> 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 >> > 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9 >> > 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 >> > >> 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12 >> > 11 >> > :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)} >> > >> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 >> > 11:36:25,AGE,55),(1333477861077/home/hadoo >> > p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12 >> > 11:36:25,OCCUP,xxxxxxx >> > >> xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201 >> > 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)} >> > >> > snippet of the script >> > >> > D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN'; >> > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY' >> OR >> > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP'); >> > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP' >> OR >> > FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1'; >> > NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as >> > A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as >> > A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as >> > A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE; >> > AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as >> > B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as >> B_F_ID_ROOT, >> > CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as >> > B_FIELD_VALUE; >> > A_JOIN = JOIN NM_CT_ST_FIELDS BY >> > (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY >> > (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT); >> > B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY >> > (FILE_NAME,CREATED_DATE); >> > A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID, >> > A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE; >> > B_JOIN_F = FOREACH B_JOIN GENERATE >> > >> B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE; >> > FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY >> > (B_FILE_NAME,B_CREATED_DATE); >> > FINAL_DISTINCT = DISTINCT FINAL; >> > >> > >> > On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> >> wrote: >> > >> >> It's not clear to me what exactly you are trying to accomplish. Could
|
|