|
|
-
Internal error 2999 - misuse of CONCAT? misuse of GROUP?
william.dowling@... 2011-04-05, 20:10
I am a new pig user and have run into “Internal error 2999” .
2011-04-05 15:59:57,445 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null Details at logfile: /proj/CitationSystem/backend/hadoop/testbed-hold/pig_1302033581143.log
That shows:
Pig Stack Trace --------------- ERROR 2999: Unexpected internal error. null
java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3116) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1793) at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:67) at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:32) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2869) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2430) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:378) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:102) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30) at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89) at org.apache.pig.impl.logicalLayer.UnionOnSchemaSetter.visit(UnionOnSchemaSetter.java:70) at org.apache.pig.impl.logicalLayer.LOUnion.visit(LOUnion.java:177) at org.apache.pig.impl.logicalLayer.LOUnion.visit(LOUnion.java:38) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.PigServer.compileLp(PigServer.java:1317) at org.apache.pig.PigServer.compileLp(PigServer.java:1306) at org.apache.pig.PigServer.compileLp(PigServer.java:1241) at org.apache.pig.PigServer.compileLp(PigServer.java:1221) at org.apache.pig.PigServer.execute(PigServer.java:1178) at org.apache.pig.PigServer.access$100(PigServer.java:128) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517) at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362) at org.apache.pig.PigServer.executeBatch(PigServer.java:329) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:510) at org.apache.pig.Main.main(Main.java:107)
Most likely I am doing something wrong, so any advice would be appreciated. Here is my setup - I have a pig script like this:
[… statements define SrcFuid and NewCitationRel …] TCRaw = join SrcFuid by citingdocid, NewCitationRel by citeddocid; describe TCRaw; dump TCRaw; TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid, SrcFuid.col, (chararray)SrcFuid.seq); store TCGroupedByFuid into 'foo'; The log shows the output of the describe and dump commands (I’ve formatted for readability):
TCRaw: {SrcFuid::citingdocid: int, SrcFuid::col: bytearray, SrcFuid::seq: int, NewCitationRel::citeddocid: int, NewCitationRel::citingdocid: int, NewCitationRel::col: bytearray, NewCitationRel::seq: int, NewCitationRel::year: int, NewCitationRel::eds: bytearray}
(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI) (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI) (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI) (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
What I was hoping for was something like (‘14159274BCI6’, {(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI), (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI)}) (‘14159274WOS16’, {(14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI) (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)})
If anyone could give me a hint what to do get that I’d appreciate it much. Thanks!
Will
William F Dowling Sr Technical Specialist, Software Engineering Thomson Reuters 0 +1 215 823 3853
+
william.dowling@... 2011-04-05, 20:10
-
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Thejas M Nair 2011-04-05, 21:59
Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -
TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid, SrcFuid.col, SrcFuid.seq);
If you want the group key to be concatenated for some reason, you can see if generating a concatenated string helps. Which version of pig are you using ? From the stack trace, it looks like the version is an old(er) one. This issue might have been fixed in a newer version of pig.
-Thejas On 4/5/11 1:10 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: dump TCRaw; TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid, SrcFuid.col, (chararray)SrcFuid.seq); store TCGroupedByFuid into 'foo';
+
Thejas M Nair 2011-04-05, 21:59
-
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Xiaomeng Wan 2011-04-05, 22:53
concat only takes two fields at a time. use concat(field1, concat(field2, field3))
Shawn
On Tue, Apr 5, 2011 at 3:59 PM, Thejas M Nair <[EMAIL PROTECTED]> wrote: > > > Do you need the group-key to be concatenated ? If not, you can just group on all the three columns - > > TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid, > SrcFuid.col, > SrcFuid.seq); > > If you want the group key to be concatenated for some reason, you can see if generating a concatenated string helps. > Which version of pig are you using ? From the stack trace, it looks like the version is an old(er) one. This issue might have been fixed in a newer version of pig. > > -Thejas > > > On 4/5/11 1:10 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > > dump TCRaw; > TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid, > SrcFuid.col, > (chararray)SrcFuid.seq); > store TCGroupedByFuid into 'foo'; > >
+
Xiaomeng Wan 2011-04-05, 22:53
-
RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
william.dowling@... 2011-04-06, 18:17
-----Original Message----- From: Xiaomeng Wan [mailto:[EMAIL PROTECTED]] Sent: Tuesday, April 05, 2011 6:54 PM To: [EMAIL PROTECTED] Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
concat only takes two fields at a time. use concat(field1, concat(field2, field3))
Shawn -------------------------- Hi Shawn,
Thanks for your response. I have tried supplying just two arguments to CONCAT(), but I get the same
ERROR 2999: Unexpected internal error. null
java.lang.NullPointerException
that I originally reported. It's a good tip to use only two arguments, but I think something else is (also) going on. Thanks!
Will
+
william.dowling@... 2011-04-06, 18:17
-
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Xiaomeng Wan 2011-04-06, 19:27
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid.citingdocid, SrcFuid.col, (chararray)SrcFuid.seq);
should be
TCGroupedByFuid = group TCRaw by CONCAT((chararray)SrcFuid::citingdocid,CONCAT( SrcFuid::col, (chararray)SrcFuid::seq));
Shawn
On Wed, Apr 6, 2011 at 12:17 PM, <[EMAIL PROTECTED]> wrote: > -----Original Message----- > From: Xiaomeng Wan [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, April 05, 2011 6:54 PM > To: [EMAIL PROTECTED] > Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP? > > concat only takes two fields at a time. use concat(field1, > concat(field2, field3)) > > Shawn > -------------------------- > > > Hi Shawn, > > Thanks for your response. I have tried supplying just two arguments to CONCAT(), but I get the same > > ERROR 2999: Unexpected internal error. null > > java.lang.NullPointerException > > that I originally reported. It's a good tip to use only two arguments, but I think something else is (also) going on. Thanks! > > Will >
+
Xiaomeng Wan 2011-04-06, 19:27
-
RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
william.dowling@... 2011-04-06, 18:09
>Do you need the group-key to be concatenated ? If not, you can just group on all the three columns -
>TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid, SrcFuid.col, SrcFuid.seq);
Hi Thejas,
I had tried that originally before introducing CONCAT(), but I got this error message:
ERROR 0: Scalar has more than one row in the output. 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17)
I don't understand that, since TCRaw is
(14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI) (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI) (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI) (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI)
and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a member of SrcFuid). So I think my understanding of GROUP is incorrect.
Thanks for your help!
Will
+
william.dowling@... 2011-04-06, 18:09
-
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Thejas M Nair 2011-04-06, 19:30
In the relation TCRaw, there is no column called SrcFuid. As a result, you end up using this feature - http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Scalars . Change your statement to - TCGroupedByFuid = group TCRaw by (citingdocid, col, seq); Thanks, Thejas On 4/6/11 11:09 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > >> Do you need the group-key to be concatenated ? If not, you can just group on >> all the three columns - > >> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid, > SrcFuid.col, > SrcFuid.seq); > > Hi Thejas, > > I had tried that originally before introducing CONCAT(), but I got this error > message: > > ERROR 0: Scalar has more than one row in the output. > 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17) > > I don't understand that, since TCRaw is > > (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI) > (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI) > (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI) > (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI) > > and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a > member of SrcFuid). So I think my understanding of GROUP is incorrect. > > Thanks for your help! > > Will > >
+
Thejas M Nair 2011-04-06, 19:30
-
Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
Thejas M Nair 2011-04-06, 19:42
This feature/syntax seems be causing confusion in many cases , so I have proposed deprecating this syntax in the next release . See - https://issues.apache.org/jira/browse/PIG-1967 . -Thejas On 4/6/11 12:30 PM, "Thejas M Nair" <[EMAIL PROTECTED]> wrote: In the relation TCRaw, there is no column called SrcFuid. As a result, you end up using this feature - http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Scalars . Change your statement to - TCGroupedByFuid = group TCRaw by (citingdocid, col, seq); Thanks, Thejas On 4/6/11 11:09 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > >> Do you need the group-key to be concatenated ? If not, you can just group on >> all the three columns - > >> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid, > SrcFuid.col, > SrcFuid.seq); > > Hi Thejas, > > I had tried that originally before introducing CONCAT(), but I got this error > message: > > ERROR 0: Scalar has more than one row in the output. > 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17) > > I don't understand that, since TCRaw is > > (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI) > (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI) > (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI) > (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI) > > and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a > member of SrcFuid). So I think my understanding of GROUP is incorrect. > > Thanks for your help! > > Will > >
+
Thejas M Nair 2011-04-06, 19:42
-
RE: Internal error 2999 - misuse of CONCAT? misuse of GROUP?
william.dowling@... 2011-04-06, 19:50
Hi Thejas, Thanks again for your help. When I omit the SrcFuid "qualifier" and use the form you suggest, I get this error (that was actually the reason I tried SrcFuid.<field> to start with.) Pig Stack Trace --------------- ERROR 1025: Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1551) at org.apache.pig.PigServer.registerQuery(PigServer.java:523) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:510) at org.apache.pig.Main.main(Main.java:107) Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Found more than one match: SrcFuid::citingdocid, NewCitationRel::citingdocid at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:7418) at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:7226) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5297) But the good news is that I combined this suggestion with Shawn's and found that this works: TCGroupedByFuid = group TCRaw by (SrcFuid::citingdocid, SrcFuid::col, SrcFuid::seq); Thanks Thejas and Shawn! Will William F Dowling Sr Technical Specialist, Software Engineering Thomson Reuters 0 +1 215 823 3853 -----Original Message----- From: Thejas M Nair [mailto:[EMAIL PROTECTED]] Sent: Wednesday, April 06, 2011 3:31 PM To: [EMAIL PROTECTED]; Dowling, William (Hlthcr&Science) Subject: Re: Internal error 2999 - misuse of CONCAT? misuse of GROUP? In the relation TCRaw, there is no column called SrcFuid. As a result, you end up using this feature - http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Casting+Relations+to+Scalars . Change your statement to - TCGroupedByFuid = group TCRaw by (citingdocid, col, seq); Thanks, Thejas On 4/6/11 11:09 AM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > >> Do you need the group-key to be concatenated ? If not, you can just group on >> all the three columns - > >> TCGroupedByFuid = group TCRaw by (SrcFuid.citingdocid, > SrcFuid.col, > SrcFuid.seq); > > Hi Thejas, > > I had tried that originally before introducing CONCAT(), but I got this error > message: > > ERROR 0: Scalar has more than one row in the output. > 1st : (14159274,BCI,6), 2nd :(45937168,BCI,17) > > I don't understand that, since TCRaw is > > (14159274,BCI,6,14159274,14159163,BCI,5,1999,BCI.BCI) > (14159274,BCI,6,14159274,14159163,WOS,11,1999,WOS.SCI) > (14159274,WOS,16,14159274,14159163,BCI,5,1999,BCI.BCI) > (14159274,WOS,16,14159274,14159163,WOS,11,1999,WOS.SCI) > > and the 2nd tuple is not a (projection of any) member of TCRaw (though it is a > member of SrcFuid). So I think my understanding of GROUP is incorrect. > > Thanks for your help! > > Will > >
+
william.dowling@... 2011-04-06, 19:50
|
|