Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Question on filter inside FOREACH inner loop


Copy link to this message
-
RE: Question on filter inside FOREACH inner loop
It works with the latest trunk and should work with 0.3.0. A sample run
follows:

grunt> a = load '/user/sms/data/filter.data' as (field1, field2);
grunt> dump a;

(a,10)  (a,10)  (a,11)  (b,10)  (b,10)  (a,10)  (a,10)  (a,10)  (b,12)
(c,13)  (c,14)
  
grunt> b = group a by field1;
grunt> dump b;

(a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)})
(b,{(b,10),(b,10),(b,12)}) (c,{(c,13),(c,14)})

grunt> c = foreach b { filter1 = filter a by (field2 != 10 and field2 !11); generate group, COUNT(filter1);};
grunt> dump c;

(a,0L) (b,1L) (c,2L)

Santhosh
-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 07, 2009 12:22 PM
To: [EMAIL PROTECTED]
Subject: Re: Question on filter inside FOREACH inner loop

I suspect this is PIG-514 (http://issues.apache.org/jira/browse/
PIG-514), which was resolved in pig 0.3.0.  Can you try your script  
with that version and see it resolves the problem.

Alan.

On Jul 7, 2009, at 7:13 AM, Gururaj S Mayya wrote:

> Hi,
>
> Some more observations on the same. The example is modified from the
> piglatin.pdf, so not sure what we are missing.
>
> The FOREACH stops processing as soon as the inner bag is empty due  
> to the
> filter.
>
> Say, we have four rows to the FOREACH, out of which Row 1 and Row 3  
> match
> the filter criteria.
>
> Row 1 has inner bag with results satisfies the filter condition, so  
> the
> result is not an empty bag. -- This outputs.
> Row 2 has inner bag with results NOT satisfying the filter condition  
> -- Does
> not outputs and STOPS foreach. No further rows are checked.
>
> thx
> ~gururaj
>
> On Tue, Jul 7, 2009 at 7:25 PM, Gururaj S Mayya <[EMAIL PROTECTED]>  
> wrote:
>
>> Hi,
>>
>> I am filtering the records based on values of inner bag the inner  
>> loop as
>> shown in the following sequence of pig scripts. I am getting weird  
>> results
>> based on whether the first row satisfies the filter condition or not.
>>
>> Am using pig-0.2.0.jar downloaded from the pig site. Scripts  
>> running in
>> local mode.
>>
>>
>> grunt> a = load 'c' as
>> (first:chararray,second:int);
>>
>> grunt> dump a;
>> 2009-07-07 19:16:34,549 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:16:34,549 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
>> Success!!
>> (a,10)
>> (a,10)
>> (a,11)
>> (b,10)
>> (b,10)
>> (a,10)
>> (a,10)
>> (a,10)
>> (b,12)
>> (c,13)
>> (c,14)
>> grunt> b = group a by first;
>> grunt> dump b
>> 2009-07-07 19:16:41,703 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:16:41,703 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
>> Success!!
>> (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)})
>> (b,{(b,10),(b,10),(b,12)})
>> (c,{(c,13),(c,14)})
>>
>> Am applying a filter on the inner bag which is making the first  
>> tuple (
>> with a) to be part of result. I get the results as expected. No  
>> issues.
>>
>> grunt> c = foreach b {
>>>> filter1 = filter a by (second == 10);
>>>> generate group,COUNT(filter1);
>>>> }
>> grunt> dump c;
>> 2009-07-07 19:18:54,835 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:18:54,835 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
>> Success!!
>> (a,5L)
>> (b,2L)
>>
>>
>> Now assume I am applying the filter in which the first tuple (tuple  
>> with
>> 'a') does not satisfy.
>>
>> I dont get any reply! Ideally keys with 'b' and 'c' should have  
>> come in the
>> result set.
>>
>> grunt> c = foreach b {
>>>> filter1 = filter a by (second != 10 and second != 11);
>>>> generate group,COUNT(filter1);
>>>> }
>> grunt> dump c;
>> 2009-07-07 19:18:24,681 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-07-07 19:18:24,681 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher -  
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB