Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Slow tutorial?


+
Mark Snow 2008-06-26, 03:07
+
Amir Youssefi 2008-06-26, 19:29
Copy link to this message
-
RE: Slow tutorial?
Checking the code just committed I see that defaults are there:

-    private static long gcActivationSize = Long.MAX_VALUE ;
-    private static long spillFileSizeThreshold = 0L ;
+    // if we freed at least this much, invoke GC
+    // (default 40 MB - this can be overridden by user supplied
property)
+    private static long gcActivationSize = 40000000L ;
    
+    // spill file size should be at least this much
+    // (default 5MB - this can be overridden by user supplied property)
+    private static long spillFileSizeThreshold = 5000000L ;
+    
+    // this will keep track of memory freed across spills
+    // and between GC invocations
+    private static long accumulatedFreeSize = 0L;
+    
+    // fraction of biggest heap for which we want to get
+    // "memory usage threshold exceeded" notifications
+    private static double memoryThresholdFraction = 0.7;
+    
+    // fraction of biggest heap for which we want to get
+    // "collection threshold exceeded" notifications
+    private static double collectionMemoryThresholdFraction = 0.5;
So I am running it again to see how it goes this time.

Amir

-----Original Message-----
From: Amir Youssefi [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 26, 2008 12:30 PM
To: [EMAIL PROTECTED]
Subject: RE: Slow tutorial?

Hi Mark,

 pig.jar that comes with it is old and doesn't have pig.properties.

 Try making a new build (June 26th or later) and make sure you have
these in pig.properties:

#Do not spill temp files smaller than this size (bytes)
pig.spill.size.threshold=5000000
#EXPERIMENT: Activate garbage collection when spilling a file bigger
than this size (bytes) #This should help reduce the number of files
being spilled.
pig.spill.gc.activation.size=40000000

or similar numbers...

Amir

-----Original Message-----
From: Mark Snow [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 25, 2008 8:07 PM
To: [EMAIL PROTECTED]
Subject: Slow tutorial?

Hi All,

I downloaded the pig tutorial to give it a whirl, set it up on a hadoop
cluster I've used for a few other tasks (7 nodes, ec2) and went through
the instructions to launch tutorial script1 with the excite bz file on
hdfs. Two things jumped out:

1) Only one mapper launched
2) It's really slow. It's been almost 5 hours and still under 10% of the
mapper is completed

Have I misconfigured something? What's a good benchmark run time for the
tutorial scripts to complete?

      
+
Amir Youssefi 2008-06-26, 21:06
+
Amir Youssefi 2008-06-26, 21:12
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB