Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Job setup for a pig run takes ages


Copy link to this message
-
RE: Job setup for a pig run takes ages
This is the jstack output during the setup time, not exactly sure how to interoperate it.

Thanks.
Dan

[dli@hmaster run]$ jstack 15640
2012-06-18 17:32:47
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b17 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000055dcb800 nid=0x431d waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x0000000055105000 nid=0x3d3b runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x0000000055103000 nid=0x3d3a waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x0000000055100000 nid=0x3d39 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00000000550fe000 nid=0x3d38 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00000000550de800 nid=0x3d37 in Object.wait() [0x0000000041d7a000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaab48a3cf8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
        - locked <0x00002aaab48a3cf8> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00000000550dc800 nid=0x3d36 in Object.wait() [0x0000000041093000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaab48a3cb0> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x00002aaab48a3cb0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0000000055065800 nid=0x3d25 runnable [0x0000000041653000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:164)
        at org.apache.pig.newplan.logical.relational.LOInnerLoad.getSchema(LOInnerLoad.java:59)
        at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
        at org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:109)
        at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
        at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:94)
        at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:71)
        at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
        at org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
        at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:113)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:281)
        at org.apache.pig.PigServer.compilePp(PigServer.java:1365)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1207)
        at org.apache.pig.PigServer.execute(PigServer.java:1201)
        at org.apache.pig.PigServer.access$100(PigServer.java:129)
        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1528)
        at org.apache.pig.PigServer.executeBatchEx(PigServer.java:373)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:340)
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
        at org.apache.pig.Main.run(Main.java:396)
        at org.apache.pig.Main.main(Main.java:107)

"VM Thread" prio=10 tid=0x00000000550d8800 nid=0x3d35 runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000055078800 nid=0x3d26 runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000005507a800 nid=0x3d27 runnable

"GC task thread#2 (ParallelGC)" prio=10 tid=0x000000005507c000 nid=0x3d28 runnable

"GC task thread#3 (ParallelGC)" prio=10 tid=0x000000005507e000 nid=0x3d29 runnable

"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000055080000 nid=0x3d2a runnable

"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000055081800 nid=0x3d2b runnable

"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000055083800 nid=0x3d2c runnable

"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000055085800 nid=0x3d2d runnable

"GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000055087000 nid=0x3d2e runnable

"GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000055089000 nid=0x3d2f runnable

"GC task thread#10 (ParallelGC)" prio=10 tid=0x000000005508b000 nid=0x3d30 runnable

"GC task thread#11 (ParallelGC)" prio=10 tid=0x000000005508c800 nid=0x3d31 runnable

"GC task thread#12 (ParallelGC)" prio=10 tid=0x000000005508e800 nid=0x3d32 runnable

"GC task thread#13 (ParallelGC)" prio=10 tid=0x0000000055090800 nid=0x3d33 runnable

"GC task thread#14 (ParallelGC)" prio=10 tid=0x0000000055092800 nid=0x3d34 runnable

"VM Periodic Task Thread" prio=10 tid=0x0000000055110000 nid=0x3d3c waiting on condition

JNI global references: 1463
From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
Sent: Saturday, June 16, 2012 8:24 AM
To: [EMAIL PROTECTED]
Subject: Re: Job setup for a pig run takes ages

What loader are you using? Jt is not the place to look at, try jstacking your pig process. Most likely it's talking to the NamaNode most of the time because the loader is doing some per-file lookups.

On