Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - FILTER and fields from tuple/bags


Copy link to this message
-
Re: FILTER and fields from tuple/bags
Mohit Anchlia 2012-04-16, 14:57
Could someone help with this? Is the best way to use Flatten in this case?
Or am I doing something entirely wrong.

On Fri, Apr 13, 2012 at 4:36 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> This is the output I am expecting
>
> NC,28613,55
>
> On Fri, Apr 13, 2012 at 4:28 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>wrote:
>
>> Could you type out the actual results you would like to see? I am not
>> sure what you expect the results of " foreach rel GENERATE FIELD =>> ST, FIELD == ZIP, FIELD == AGE" to look like.
>>
>> Also, your Pig scripts do not have to be all caps. Using something
>> other than all caps will make then a lot more readable...
>>
>> D
>>
>> On Fri, Apr 13, 2012 at 3:41 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>> wrote:
>> > This is my pig script so far that gives me output. What I want to do is
>> > arrange them in this order NC,28613,55 from below output.
>> >
>> > My question is from this relation how can I extract specific fields from
>> > bags and tuples? Essentially I want to do something like:
>> >
>> > foreach rel GENERATE FIELD == ST, FIELD == ZIP, FIELD == AGE --I want
>> > fields in this order from a given relation. But the problem is it's
>> > arranged in a bag and multiple tuples
>> >
>> >
>> >
>> (1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,04/03/12
>> > 11:36:25)
>> >
>> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/
>> > 03/12
>> >
>> 11:36:25,ST,NC),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
>> > 11:36:25,ZIP,28613),(1333477861077/home/hadoop/pigtest/./formml_dat/9
>> > 99000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
>> >
>> 11:36:25,CITY,Xxxxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,FINFOWKS,PER,FINFOWKS,04/03/12
>> > 11
>> > :36:25,NAM2,Xxxxx X &xxx; Xxxxx X Xxxxxx)}
>> >
>> {(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
>> > 11:36:25,AGE,55),(1333477861077/home/hadoo
>> > p/pigtest/./formml_dat/999000093_tax_return.xml,WKS,PER,WKS,04/03/12
>> > 11:36:25,OCCUP,xxxxxxx
>> >
>> xxxxx),(1333477861077/home/hadoop/pigtest/./formml_dat/999000093_tax_return.xml,WKS,S201
>> > 1US1040PER,WKS,04/03/12 11:36:25,MARITAL,Married)}
>> >
>> > snippet of the script
>> >
>> > D = FILTER A by F_ID == 'FINFOWKS' AND FIELD_ID == 'TSN';
>> > NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR FIELD_ID == 'CITY'
>> OR
>> > FIELD_ID == 'ST' OR FIELD_ID == 'ZIP');
>> > AG_OC_MT_FILTER = FILTER A by (FIELD_ID == 'AGE' OR FIELD_ID == 'OCCUP'
>> OR
>> > FIELD_ID == 'MARITAL') AND F_ID == 'WKS' AND F_COPY_NUM == '1';
>> > NM_CT_ST_FIELDS = FOREACH NM_CT_ST_FILTER GENERATE FILE_NAME as
>> > A_FILE_NAME, F_ID as A_F_ID, FSET_ID as A_FSET_ID, F_ID_ROOT as
>> > A_F_ID_ROOT, CREATED_DATE as A_CREATED_DATE,FIELD_ID as
>> > A_FIELD_ID,FIELD_VALUE as A_FIELD_VALUE;
>> > AG_OC_MT_FIELDS = FOREACH AG_OC_MT_FILTER GENERATE FILE_NAME as
>> > B_FILE_NAME,F_ID as B_F_ID, FSET_ID as B_FSET_ID, F_ID_ROOT as
>> B_F_ID_ROOT,
>> > CREATED_DATE as B_CREATED_DATE,FIELD_ID as B_FIELD_ID,FIELD_VALUE as
>> > B_FIELD_VALUE;
>> > A_JOIN = JOIN NM_CT_ST_FIELDS BY
>> > (A_FILE_NAME,A_CREATED_DATE,A_F_ID,A_F_ID_ROOT), D BY
>> > (FILE_NAME,CREATED_DATE,F_ID,F_ID_ROOT);
>> > B_JOIN = JOIN AG_OC_MT_FIELDS BY (B_FILE_NAME,B_CREATED_DATE), D BY
>> > (FILE_NAME,CREATED_DATE);
>> > A_JOIN_F = FOREACH A_JOIN GENERATE A_FILE_NAME, A_F_ID, A_FSET_ID,
>> > A_F_ID_ROOT, A_CREATED_DATE,A_FIELD_ID,A_FIELD_VALUE,FIELD_VALUE;
>> > B_JOIN_F = FOREACH B_JOIN GENERATE
>> >
>> B_FILE_NAME,B_F_ID,B_FSET_ID,B_F_ID_ROOT,B_CREATED_DATE,B_FIELD_ID,B_FIELD_VALUE;
>> > FINAL = COGROUP A_JOIN_F BY (A_FILE_NAME,A_CREATED_DATE), B_JOIN_F BY
>> > (B_FILE_NAME,B_CREATED_DATE);
>> > FINAL_DISTINCT = DISTINCT FINAL;
>> >
>> >
>> > On Thu, Apr 12, 2012 at 7:37 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >> It's not clear to me what exactly you are trying to accomplish. Could