Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Question on filter inside FOREACH inner loop


Copy link to this message
-
RE: Question on filter inside FOREACH inner loop
Santhosh Srinivasan 2009-07-07, 21:04
It works with the latest trunk and should work with 0.3.0. A sample run
follows:

grunt> a = load '/user/sms/data/filter.data' as (field1, field2);
grunt> dump a;

(a,10)  (a,10)  (a,11)  (b,10)  (b,10)  (a,10)  (a,10)  (a,10)  (b,12)
(c,13)  (c,14)
  
grunt> b = group a by field1;
grunt> dump b;

(a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)})
(b,{(b,10),(b,10),(b,12)}) (c,{(c,13),(c,14)})

grunt> c = foreach b { filter1 = filter a by (field2 != 10 and field2 !11); generate group, COUNT(filter1);};
grunt> dump c;

(a,0L) (b,1L) (c,2L)

Santhosh
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 07, 2009 12:22 PM
To: [EMAIL PROTECTED]
Subject: Re: Question on filter inside FOREACH inner loop

I suspect this is PIG-514 (http://issues.apache.org/jira/browse/
PIG-514), which was resolved in pig 0.3.0.  Can you try your script  
with that version and see it resolves the problem.

Alan.

On Jul 7, 2009, at 7:13 AM, Gururaj S Mayya wrote:

> Hi,
>
> Some more observations on the same. The example is modified from the
> piglatin.pdf, so not sure what we are missing.
>
> The FOREACH stops processing as soon as the inner bag is empty due  
> to the
> filter.
>
> Say, we have four rows to the FOREACH, out of which Row 1 and Row 3  
> match
> the filter criteria.
>
> Row 1 has inner bag with results satisfies the filter condition, so  
> the
> result is not an empty bag. -- This outputs.
> Row 2 has inner bag with results NOT satisfying the filter condition  
> -- Does
> not outputs and STOPS foreach. No further rows are checked.
>
> thx
> ~gururaj
>
> On Tue, Jul 7, 2009 at 7:25 PM, Gururaj S Mayya <[EMAIL PROTECTED]>  
> wrote:
>
>> Hi,
>>
>> I am filtering the records based on values of inner bag the inner  
>> loop as
>> shown in the following sequence of pig scripts. I am getting weird  
>> results
>> based on whether the first row satisfies the filter condition or not.
>>
>> Am using pig-0.2.0.jar downloaded from the pig site. Scripts  
>> running in
>> local mode.
>>
>>
>> grunt> a = load 'c' as
>> (first:chararray,second:int);
>>
>> grunt> dump a;
>> 2009-07-07 19:16:34,549 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:16:34,549 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
>> Success!!
>> (a,10)
>> (a,10)
>> (a,11)
>> (b,10)
>> (b,10)
>> (a,10)
>> (a,10)
>> (a,10)
>> (b,12)
>> (c,13)
>> (c,14)
>> grunt> b = group a by first;
>> grunt> dump b
>> 2009-07-07 19:16:41,703 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:16:41,703 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
>> Success!!
>> (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)})
>> (b,{(b,10),(b,10),(b,12)})
>> (c,{(c,13),(c,14)})
>>
>> Am applying a filter on the inner bag which is making the first  
>> tuple (
>> with a) to be part of result. I get the results as expected. No  
>> issues.
>>
>> grunt> c = foreach b {
>>>> filter1 = filter a by (second == 10);
>>>> generate group,COUNT(filter1);
>>>> }
>> grunt> dump c;
>> 2009-07-07 19:18:54,835 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:18:54,835 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
>> Success!!
>> (a,5L)
>> (b,2L)
>>
>>
>> Now assume I am applying the filter in which the first tuple (tuple  
>> with
>> 'a') does not satisfy.
>>
>> I dont get any reply! Ideally keys with 'b' and 'c' should have  
>> come in the
>> result set.
>>
>> grunt> c = foreach b {
>>>> filter1 = filter a by (second != 10 and second != 11);
>>>> generate group,COUNT(filter1);
>>>> }
>> grunt> dump c;
>> 2009-07-07 19:18:24,681 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:18:24,681 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -