|
|
Matthew Smith 2010-08-04, 22:07
Hey, While running in Java a LIMIT statement is not getting executed. /code myServer.registerQuery("flow_firstcut = FOREACH data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); myServer.registerQuery("filtered = FILTER flow_firstcut BY sIP matches 'someIP';"); myServer.registerQuery("O = ORDER filtered BY bytes DESC;"); myServer.registerQuery("topTen = LIMIT O 10;"); myServer.store("topTen", outputFilePath); /code This produces a 699 line file. It should produce a 10 line file. /code registerQuery("flow_firstcut = FOREACH data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); myServer.registerQuery("filtered = FILTER flow_firstcut BY sIP matches '"+parameters[1]+"';"); //myServer.registerQuery("O = ORDER filtered BY bytes DESC;"); myServer.registerQuery("topTen = LIMIT filtered 10;"); myServer.store("topTen", outputFilePath); /code This produces a 10 line file. Is there a known bug I am unaware of or can you not order then limit? http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT indicates that this is a valid sequence of calls. Help? Matt
+
Matthew Smith 2010-08-04, 22:07
Ashutosh Chauhan 2010-08-05, 16:53
Matt, Which version you are on? What happens if you run your query through grunt instead of PigServer? I tried load-order-limit sequence on a small dataset on grunt and I got expected results. Ashutosh On Wed, Aug 4, 2010 at 15:07, Matthew Smith <[EMAIL PROTECTED]> wrote: > Hey, > > > > While running in Java a LIMIT statement is not getting executed. > > > > /code > > myServer.registerQuery("flow_firstcut = FOREACH > data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); > > myServer.registerQuery("filtered = FILTER > flow_firstcut BY sIP matches 'someIP';"); > > > > myServer.registerQuery("O = ORDER filtered BY > bytes DESC;"); > > > > myServer.registerQuery("topTen = LIMIT O 10;"); > > > > myServer.store("topTen", outputFilePath); > > > > /code > > > > This produces a 699 line file. It should produce a 10 line file. > > > > /code > > registerQuery("flow_firstcut = FOREACH data > GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); > > myServer.registerQuery("filtered = FILTER > flow_firstcut BY sIP matches '"+parameters[1]+"';"); > > > > //myServer.registerQuery("O = ORDER filtered BY > bytes DESC;"); > > > > myServer.registerQuery("topTen = LIMIT filtered > 10;"); > > > > myServer.store("topTen", outputFilePath); > > /code > > > > This produces a 10 line file. > > > > Is there a known bug I am unaware of or can you not order then limit? > > http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT> > indicates that this is a valid sequence of calls. > > > > Help? > > > > Matt > >
+
Ashutosh Chauhan 2010-08-05, 16:53
Matthew Smith 2010-08-05, 17:57
No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0. -----Original Message----- From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 05, 2010 12:54 PM To: [EMAIL PROTECTED] Subject: Re: LIMIT Issue Matt, Which version you are on? What happens if you run your query through grunt instead of PigServer? I tried load-order-limit sequence on a small dataset on grunt and I got expected results. Ashutosh On Wed, Aug 4, 2010 at 15:07, Matthew Smith <[EMAIL PROTECTED]> wrote: > Hey, > > > > While running in Java a LIMIT statement is not getting executed. > > > > /code > > myServer.registerQuery("flow_firstcut = FOREACH > data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); > > myServer.registerQuery("filtered = FILTER > flow_firstcut BY sIP matches 'someIP';"); > > > > myServer.registerQuery("O = ORDER filtered BY > bytes DESC;"); > > > > myServer.registerQuery("topTen = LIMIT O 10;"); > > > > myServer.store("topTen", outputFilePath); > > > > /code > > > > This produces a 699 line file. It should produce a 10 line file. > > > > /code > > registerQuery("flow_firstcut = FOREACH data > GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); > > myServer.registerQuery("filtered = FILTER > flow_firstcut BY sIP matches '"+parameters[1]+"';"); > > > > //myServer.registerQuery("O = ORDER filtered BY > bytes DESC;"); > > > > myServer.registerQuery("topTen = LIMIT filtered > 10;"); > > > > myServer.store("topTen", outputFilePath); > > /code > > > > This produces a 10 line file. > > > > Is there a known bug I am unaware of or can you not order then limit? > > http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT> > indicates that this is a valid sequence of calls. > > > > Help? > > > > Matt > >
+
Matthew Smith 2010-08-05, 17:57
Ashutosh Chauhan 2010-08-05, 19:10
To cut down on the problem space, can you try your query on grunt. If it works there, problem would be something to do with PigServer, else its related to Pig core itself. Ashutosh On Thu, Aug 5, 2010 at 10:57, Matthew Smith <[EMAIL PROTECTED]> wrote: > No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0. > > -----Original Message----- > From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 05, 2010 12:54 PM > To: [EMAIL PROTECTED] > Subject: Re: LIMIT Issue > > Matt, > > Which version you are on? What happens if you run your query through > grunt instead of PigServer? > I tried load-order-limit sequence on a small dataset on grunt and I > got expected results. > > Ashutosh > On Wed, Aug 4, 2010 at 15:07, Matthew Smith <[EMAIL PROTECTED]> wrote: >> Hey, >> >> >> >> While running in Java a LIMIT statement is not getting executed. >> >> >> >> /code >> >> myServer.registerQuery("flow_firstcut = FOREACH >> data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); >> >> myServer.registerQuery("filtered = FILTER >> flow_firstcut BY sIP matches 'someIP';"); >> >> >> >> myServer.registerQuery("O = ORDER filtered BY >> bytes DESC;"); >> >> >> >> myServer.registerQuery("topTen = LIMIT O 10;"); >> >> >> >> myServer.store("topTen", outputFilePath); >> >> >> >> /code >> >> >> >> This produces a 699 line file. It should produce a 10 line file. >> >> >> >> /code >> >> registerQuery("flow_firstcut = FOREACH data >> GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;"); >> >> myServer.registerQuery("filtered = FILTER >> flow_firstcut BY sIP matches '"+parameters[1]+"';"); >> >> >> >> //myServer.registerQuery("O = ORDER filtered BY >> bytes DESC;"); >> >> >> >> myServer.registerQuery("topTen = LIMIT filtered >> 10;"); >> >> >> >> myServer.store("topTen", outputFilePath); >> >> /code >> >> >> >> This produces a 10 line file. >> >> >> >> Is there a known bug I am unaware of or can you not order then limit? >> >> http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT>> >> indicates that this is a valid sequence of calls. >> >> >> >> Help? >> >> >> >> Matt >> >> >
+
Ashutosh Chauhan 2010-08-05, 19:10
Matthew Smith 2010-08-05, 21:54
While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.
I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.
Thoughts? grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray); grunt> B = FILTER A BY sIP matches '61.81.46.45'; grunt> C = ORDER B BY bytes DESC; grunt> D = LIMIT C 10; grunt> DUMP D; 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54) 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001 2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner - 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now 2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner - 2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done. 2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backe
+
Matthew Smith 2010-08-05, 21:54
Ashutosh Chauhan 2010-08-06, 06:43
This is most likely because B is empty. do
grunt> dump A; -- to verify data is getting loaded as you are expecting. grunt> dump B; -- to verify that B is non-empty.
Ashutosh
On Thu, Aug 5, 2010 at 14:54, Matthew Smith <[EMAIL PROTECTED]> wrote: > While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed. > > I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed. > > Thoughts? > > > grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray); > grunt> B = FILTER A BY sIP matches '61.81.46.45'; > grunt> C = ORDER B BY bytes DESC; > grunt> D = LIMIT C 10; > grunt> DUMP D; > > > > > 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A > 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A > 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId> 2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54) > 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 > 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 > 2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job > 2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. > 2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. > 2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 > 2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 > 2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete > 2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
+
Ashutosh Chauhan 2010-08-06, 06:43
Matthew Smith 2010-08-06, 14:16
B is not empty: (58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA) (58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA) (58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA) (58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA) (58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA) (58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA) (58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA) (58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA) (58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA) (58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA) (58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA) (58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA) (58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA) (58.72.19.26, 58.72.19.26,69740,21873,6,322, S A) (58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA) (58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA) (58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA) (58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA) (58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA) (58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A) (58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA) (58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA) (58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA) But after I do: > grunt> C = ORDER B BY bytes DESC; > grunt> Dump C;
I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160 Which would lead me to believe my ORDER is broken. Is there a conf I need to change? -----Original Message----- From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] Sent: Friday, August 06, 2010 2:43 AM To: Matthew Smith Cc: [EMAIL PROTECTED] Subject: Re: LIMIT Issue
This is most likely because B is empty. do
grunt> dump A; -- to verify data is getting loaded as you are expecting. grunt> dump B; -- to verify that B is non-empty.
Ashutosh
On Thu, Aug 5, 2010 at 14:54, Matthew Smith <[EMAIL PROTECTED]> wrote: > While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed. > > I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed. > > Thoughts? > > > grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray); > grunt> B = FILTER A BY sIP matches '61.81.46.45'; > grunt> C = ORDER B BY bytes DESC; > grunt> D = LIMIT C 10; > grunt> DUMP D; > > > > > 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A > 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A > 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId> 2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54) > 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 > 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 > 2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized > 2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
+
Matthew Smith 2010-08-06, 14:16
Ashutosh Chauhan 2010-08-09, 00:57
It looks like a bug then. Do you have a script and small enough dataset which you can upload on jira which reproduces the issue. If so, go ahead and create a jira ticket with script and data. Are you using local mode or mapreduce mode ?
Ashutosh On Fri, Aug 6, 2010 at 07:16, Matthew Smith <[EMAIL PROTECTED]> wrote: > B is not empty: > (58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA) > (58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA) > (58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA) > (58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA) > (58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA) > (58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA) > (58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA) > (58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA) > (58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA) > (58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA) > (58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA) > (58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA) > (58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA) > (58.72.19.26, 58.72.19.26,69740,21873,6,322, S A) > (58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA) > (58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA) > (58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA) > (58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA) > (58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA) > (58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A) > (58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA) > (58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA) > (58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA) > > > But after I do: >> grunt> C = ORDER B BY bytes DESC; >> grunt> Dump C; > > I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160 > > > Which would lead me to believe my ORDER is broken. Is there a conf I need to change? > > > -----Original Message----- > From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]] > Sent: Friday, August 06, 2010 2:43 AM > To: Matthew Smith > Cc: [EMAIL PROTECTED] > Subject: Re: LIMIT Issue > > This is most likely because B is empty. do > > grunt> dump A; -- to verify data is getting loaded as you are expecting. > grunt> dump B; -- to verify that B is non-empty. > > Ashutosh > > On Thu, Aug 5, 2010 at 14:54, Matthew Smith <[EMAIL PROTECTED]> wrote: >> While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed. >> >> I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed. >> >> Thoughts? >> >> >> grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray); >> grunt> B = FILTER A BY sIP matches '61.81.46.45'; >> grunt> C = ORDER B BY bytes DESC; >> grunt> D = LIMIT C 10; >> grunt> DUMP D; >> >> >> >> >> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A >> 2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A >> 2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId>> 2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54) >> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 >> 2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
+
Ashutosh Chauhan 2010-08-09, 00:57
|
|