|
|
-
Question on filter inside FOREACH inner loop
Gururaj S Mayya 2009-07-07, 13:55
Hi,
I am filtering the records based on values of inner bag the inner loop as shown in the following sequence of pig scripts. I am getting weird results based on whether the first row satisfies the filter condition or not.
Am using pig-0.2.0.jar downloaded from the pig site. Scripts running in local mode. grunt> a = load 'c' as (first:chararray,second:int);
grunt> dump a; 2009-07-07 19:16:34,549 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-07-07 19:16:34,549 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (a,10) (a,10) (a,11) (b,10) (b,10) (a,10) (a,10) (a,10) (b,12) (c,13) (c,14) grunt> b = group a by first; grunt> dump b 2009-07-07 19:16:41,703 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-07-07 19:16:41,703 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) (b,{(b,10),(b,10),(b,12)}) (c,{(c,13),(c,14)})
Am applying a filter on the inner bag which is making the first tuple ( with a) to be part of result. I get the results as expected. No issues.
grunt> c = foreach b { >> filter1 = filter a by (second == 10); >> generate group,COUNT(filter1); >> } grunt> dump c; 2009-07-07 19:18:54,835 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-07-07 19:18:54,835 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (a,5L) (b,2L) Now assume I am applying the filter in which the first tuple (tuple with 'a') does not satisfy.
I dont get any reply! Ideally keys with 'b' and 'c' should have come in the result set.
grunt> c = foreach b { >> filter1 = filter a by (second != 10 and second != 11); >> generate group,COUNT(filter1); >> } grunt> dump c; 2009-07-07 19:18:24,681 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-07-07 19:18:24,681 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! grunt> Am I missing something here?
On the related note, illustrate fails on Relation 'c'
grunt> illustrate c; 2009-07-07 19:24:40,647 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.Tuple Details at logfile: /home/gururaj/Grid/pig_1246973396841.log
But still if i do 'dump c' I would get the result or Success!!.
thx, -- ~gururaj --
The 6 stages of every project are: --------------------------------- Enthusiasm. Disillusionment. Panic. The Search For The Guilty. The Punishment of the Innocent. Accolades for the Non-Participants.
-
Re: Question on filter inside FOREACH inner loop
Gururaj S Mayya 2009-07-07, 14:13
Hi,
Some more observations on the same. The example is modified from the piglatin.pdf, so not sure what we are missing.
The FOREACH stops processing as soon as the inner bag is empty due to the filter.
Say, we have four rows to the FOREACH, out of which Row 1 and Row 3 match the filter criteria.
Row 1 has inner bag with results satisfies the filter condition, so the result is not an empty bag. -- This outputs. Row 2 has inner bag with results NOT satisfying the filter condition -- Does not outputs and STOPS foreach. No further rows are checked.
thx ~gururaj
On Tue, Jul 7, 2009 at 7:25 PM, Gururaj S Mayya <[EMAIL PROTECTED]> wrote:
> Hi, > > I am filtering the records based on values of inner bag the inner loop as > shown in the following sequence of pig scripts. I am getting weird results > based on whether the first row satisfies the filter condition or not. > > Am using pig-0.2.0.jar downloaded from the pig site. Scripts running in > local mode. > > > grunt> a = load 'c' as > (first:chararray,second:int); > > grunt> dump a; > 2009-07-07 19:16:34,549 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > complete! > 2009-07-07 19:16:34,549 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,10) > (a,10) > (a,11) > (b,10) > (b,10) > (a,10) > (a,10) > (a,10) > (b,12) > (c,13) > (c,14) > grunt> b = group a by first; > grunt> dump b > 2009-07-07 19:16:41,703 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > complete! > 2009-07-07 19:16:41,703 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) > (b,{(b,10),(b,10),(b,12)}) > (c,{(c,13),(c,14)}) > > Am applying a filter on the inner bag which is making the first tuple ( > with a) to be part of result. I get the results as expected. No issues. > > grunt> c = foreach b { > >> filter1 = filter a by (second == 10); > >> generate group,COUNT(filter1); > >> } > grunt> dump c; > 2009-07-07 19:18:54,835 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > complete! > 2009-07-07 19:18:54,835 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (a,5L) > (b,2L) > > > Now assume I am applying the filter in which the first tuple (tuple with > 'a') does not satisfy. > > I dont get any reply! Ideally keys with 'b' and 'c' should have come in the > result set. > > grunt> c = foreach b { > >> filter1 = filter a by (second != 10 and second != 11); > >> generate group,COUNT(filter1); > >> } > grunt> dump c; > 2009-07-07 19:18:24,681 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > complete! > 2009-07-07 19:18:24,681 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > grunt> > > > Am I missing something here? > > On the related note, illustrate fails on Relation 'c' > > grunt> illustrate c; > 2009-07-07 19:24:40,647 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2999: Unexpected internal error. org.apache.pig.data.DefaultDataBag > cannot be cast to org.apache.pig.data.Tuple > Details at logfile: /home/gururaj/Grid/pig_1246973396841.log > > But still if i do 'dump c' I would get the result or Success!!. > > > > thx, > -- > ~gururaj > -- > > The 6 stages of every project are: > --------------------------------- > Enthusiasm. > Disillusionment. > Panic. > The Search For The Guilty. > The Punishment of the Innocent. > Accolades for the Non-Participants. > > -- ~gururaj --
The 6 stages of every project are: --------------------------------- Enthusiasm. Disillusionment. Panic. The Search For The Guilty. The Punishment of the Innocent. Accolades for the Non-Participants.
-
Re: Question on filter inside FOREACH inner loop
Alan Gates 2009-07-07, 19:21
I suspect this is PIG-514 ( http://issues.apache.org/jira/browse/ PIG-514), which was resolved in pig 0.3.0. Can you try your script with that version and see it resolves the problem. Alan. On Jul 7, 2009, at 7:13 AM, Gururaj S Mayya wrote: > Hi, > > Some more observations on the same. The example is modified from the > piglatin.pdf, so not sure what we are missing. > > The FOREACH stops processing as soon as the inner bag is empty due > to the > filter. > > Say, we have four rows to the FOREACH, out of which Row 1 and Row 3 > match > the filter criteria. > > Row 1 has inner bag with results satisfies the filter condition, so > the > result is not an empty bag. -- This outputs. > Row 2 has inner bag with results NOT satisfying the filter condition > -- Does > not outputs and STOPS foreach. No further rows are checked. > > thx > ~gururaj > > On Tue, Jul 7, 2009 at 7:25 PM, Gururaj S Mayya <[EMAIL PROTECTED]> > wrote: > >> Hi, >> >> I am filtering the records based on values of inner bag the inner >> loop as >> shown in the following sequence of pig scripts. I am getting weird >> results >> based on whether the first row satisfies the filter condition or not. >> >> Am using pig-0.2.0.jar downloaded from the pig site. Scripts >> running in >> local mode. >> >> >> grunt> a = load 'c' as >> (first:chararray,second:int); >> >> grunt> dump a; >> 2009-07-07 19:16:34,549 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:16:34,549 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> (a,10) >> (a,10) >> (a,11) >> (b,10) >> (b,10) >> (a,10) >> (a,10) >> (a,10) >> (b,12) >> (c,13) >> (c,14) >> grunt> b = group a by first; >> grunt> dump b >> 2009-07-07 19:16:41,703 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:16:41,703 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) >> (b,{(b,10),(b,10),(b,12)}) >> (c,{(c,13),(c,14)}) >> >> Am applying a filter on the inner bag which is making the first >> tuple ( >> with a) to be part of result. I get the results as expected. No >> issues. >> >> grunt> c = foreach b { >>>> filter1 = filter a by (second == 10); >>>> generate group,COUNT(filter1); >>>> } >> grunt> dump c; >> 2009-07-07 19:18:54,835 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:18:54,835 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> (a,5L) >> (b,2L) >> >> >> Now assume I am applying the filter in which the first tuple (tuple >> with >> 'a') does not satisfy. >> >> I dont get any reply! Ideally keys with 'b' and 'c' should have >> come in the >> result set. >> >> grunt> c = foreach b { >>>> filter1 = filter a by (second != 10 and second != 11); >>>> generate group,COUNT(filter1); >>>> } >> grunt> dump c; >> 2009-07-07 19:18:24,681 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:18:24,681 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> grunt> >> >> >> Am I missing something here? >> >> On the related note, illustrate fails on Relation 'c' >> >> grunt> illustrate c; >> 2009-07-07 19:24:40,647 [main] ERROR >> org.apache.pig.tools.grunt.Grunt - >> ERROR 2999: Unexpected internal error. >> org.apache.pig.data.DefaultDataBag >> cannot be cast to org.apache.pig.data.Tuple >> Details at logfile: /home/gururaj/Grid/pig_1246973396841.log >> >> But still if i do 'dump c' I would get the result or Success!!. >> >> >> >> thx, >> -- >> ~gururaj >> -- >> >> The 6 stages of every project are: >> --------------------------------- >> Enthusiasm. >> Disillusionment. >> Panic. >> The Search For The Guilty. >> The Punishment of the Innocent.
-
RE: Question on filter inside FOREACH inner loop
Santhosh Srinivasan 2009-07-07, 21:04
It works with the latest trunk and should work with 0.3.0. A sample run follows: grunt> a = load '/user/sms/data/filter.data' as (field1, field2); grunt> dump a; (a,10) (a,10) (a,11) (b,10) (b,10) (a,10) (a,10) (a,10) (b,12) (c,13) (c,14) grunt> b = group a by field1; grunt> dump b; (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) (b,{(b,10),(b,10),(b,12)}) (c,{(c,13),(c,14)}) grunt> c = foreach b { filter1 = filter a by (field2 != 10 and field2 !11); generate group, COUNT(filter1);}; grunt> dump c; (a,0L) (b,1L) (c,2L) Santhosh -----Original Message----- From: Alan Gates [mailto:[EMAIL PROTECTED]] Sent: Tuesday, July 07, 2009 12:22 PM To: [EMAIL PROTECTED] Subject: Re: Question on filter inside FOREACH inner loop I suspect this is PIG-514 ( http://issues.apache.org/jira/browse/ PIG-514), which was resolved in pig 0.3.0. Can you try your script with that version and see it resolves the problem. Alan. On Jul 7, 2009, at 7:13 AM, Gururaj S Mayya wrote: > Hi, > > Some more observations on the same. The example is modified from the > piglatin.pdf, so not sure what we are missing. > > The FOREACH stops processing as soon as the inner bag is empty due > to the > filter. > > Say, we have four rows to the FOREACH, out of which Row 1 and Row 3 > match > the filter criteria. > > Row 1 has inner bag with results satisfies the filter condition, so > the > result is not an empty bag. -- This outputs. > Row 2 has inner bag with results NOT satisfying the filter condition > -- Does > not outputs and STOPS foreach. No further rows are checked. > > thx > ~gururaj > > On Tue, Jul 7, 2009 at 7:25 PM, Gururaj S Mayya <[EMAIL PROTECTED]> > wrote: > >> Hi, >> >> I am filtering the records based on values of inner bag the inner >> loop as >> shown in the following sequence of pig scripts. I am getting weird >> results >> based on whether the first row satisfies the filter condition or not. >> >> Am using pig-0.2.0.jar downloaded from the pig site. Scripts >> running in >> local mode. >> >> >> grunt> a = load 'c' as >> (first:chararray,second:int); >> >> grunt> dump a; >> 2009-07-07 19:16:34,549 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:16:34,549 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> (a,10) >> (a,10) >> (a,11) >> (b,10) >> (b,10) >> (a,10) >> (a,10) >> (a,10) >> (b,12) >> (c,13) >> (c,14) >> grunt> b = group a by first; >> grunt> dump b >> 2009-07-07 19:16:41,703 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:16:41,703 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) >> (b,{(b,10),(b,10),(b,12)}) >> (c,{(c,13),(c,14)}) >> >> Am applying a filter on the inner bag which is making the first >> tuple ( >> with a) to be part of result. I get the results as expected. No >> issues. >> >> grunt> c = foreach b { >>>> filter1 = filter a by (second == 10); >>>> generate group,COUNT(filter1); >>>> } >> grunt> dump c; >> 2009-07-07 19:18:54,835 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:18:54,835 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - >> Success!! >> (a,5L) >> (b,2L) >> >> >> Now assume I am applying the filter in which the first tuple (tuple >> with >> 'a') does not satisfy. >> >> I dont get any reply! Ideally keys with 'b' and 'c' should have >> come in the >> result set. >> >> grunt> c = foreach b { >>>> filter1 = filter a by (second != 10 and second != 11); >>>> generate group,COUNT(filter1); >>>> } >> grunt> dump c; >> 2009-07-07 19:18:24,681 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% >> complete! >> 2009-07-07 19:18:24,681 [main] INFO >> org.apache.pig.backend.local.executionengine.LocalPigLauncher -
-
Re: Question on filter inside FOREACH inner loop
Gururaj S Mayya 2009-07-08, 03:21
Hi All, Thanks for the replies. I tried it on the latest trunk and it works fine. Even the illustrate problem which I had mentioned in the original issue also seems to be fixed. thx ~gururaj On Wed, Jul 8, 2009 at 2:34 AM, Santhosh Srinivasan <[EMAIL PROTECTED]>wrote: > It works with the latest trunk and should work with 0.3.0. A sample run > follows: > > grunt> a = load '/user/sms/data/filter.data' as (field1, field2); > grunt> dump a; > > (a,10) (a,10) (a,11) (b,10) (b,10) (a,10) (a,10) (a,10) (b,12) > (c,13) (c,14) > > grunt> b = group a by field1; > grunt> dump b; > > (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) > (b,{(b,10),(b,10),(b,12)}) (c,{(c,13),(c,14)}) > > grunt> c = foreach b { filter1 = filter a by (field2 != 10 and field2 !> 11); generate group, COUNT(filter1);}; > grunt> dump c; > > (a,0L) (b,1L) (c,2L) > > Santhosh > > > -----Original Message----- > From: Alan Gates [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 07, 2009 12:22 PM > To: [EMAIL PROTECTED] > Subject: Re: Question on filter inside FOREACH inner loop > > I suspect this is PIG-514 ( http://issues.apache.org/jira/browse/> PIG-514), which was resolved in pig 0.3.0. Can you try your script > with that version and see it resolves the problem. > > Alan. > > On Jul 7, 2009, at 7:13 AM, Gururaj S Mayya wrote: > > > Hi, > > > > Some more observations on the same. The example is modified from the > > piglatin.pdf, so not sure what we are missing. > > > > The FOREACH stops processing as soon as the inner bag is empty due > > to the > > filter. > > > > Say, we have four rows to the FOREACH, out of which Row 1 and Row 3 > > match > > the filter criteria. > > > > Row 1 has inner bag with results satisfies the filter condition, so > > the > > result is not an empty bag. -- This outputs. > > Row 2 has inner bag with results NOT satisfying the filter condition > > -- Does > > not outputs and STOPS foreach. No further rows are checked. > > > > thx > > ~gururaj > > > > On Tue, Jul 7, 2009 at 7:25 PM, Gururaj S Mayya <[EMAIL PROTECTED]> > > wrote: > > > >> Hi, > >> > >> I am filtering the records based on values of inner bag the inner > >> loop as > >> shown in the following sequence of pig scripts. I am getting weird > >> results > >> based on whether the first row satisfies the filter condition or not. > >> > >> Am using pig-0.2.0.jar downloaded from the pig site. Scripts > >> running in > >> local mode. > >> > >> > >> grunt> a = load 'c' as > >> (first:chararray,second:int); > >> > >> grunt> dump a; > >> 2009-07-07 19:16:34,549 [main] INFO > >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > >> complete! > >> 2009-07-07 19:16:34,549 [main] INFO > >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - > >> Success!! > >> (a,10) > >> (a,10) > >> (a,11) > >> (b,10) > >> (b,10) > >> (a,10) > >> (a,10) > >> (a,10) > >> (b,12) > >> (c,13) > >> (c,14) > >> grunt> b = group a by first; > >> grunt> dump b > >> 2009-07-07 19:16:41,703 [main] INFO > >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > >> complete! > >> 2009-07-07 19:16:41,703 [main] INFO > >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - > >> Success!! > >> (a,{(a,10),(a,10),(a,11),(a,10),(a,10),(a,10)}) > >> (b,{(b,10),(b,10),(b,12)}) > >> (c,{(c,13),(c,14)}) > >> > >> Am applying a filter on the inner bag which is making the first > >> tuple ( > >> with a) to be part of result. I get the results as expected. No > >> issues. > >> > >> grunt> c = foreach b { > >>>> filter1 = filter a by (second == 10); > >>>> generate group,COUNT(filter1); > >>>> } > >> grunt> dump c; > >> 2009-07-07 19:18:54,835 [main] INFO > >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% > >> complete! > >> 2009-07-07 19:18:54,835 [main] INFO > >> org.apache.pig.backend.local.executionengine.LocalPigLauncher - > >> Success!! > >> (a,5L) > >> (b,2L) > >> > >> > >> Now assume I am applying the filter in which the first tuple (tuple ~gururaj The 6 stages of every project are: Enthusiasm. Disillusionment. Panic. The Search For The Guilty. The Punishment of the Innocent. Accolades for the Non-Participants.
|
|