Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Slow tutorial?


+
Mark Snow 2008-06-26, 03:07
+
Amir Youssefi 2008-06-26, 19:29
+
Amir Youssefi 2008-06-26, 19:53
Copy link to this message
-
RE: Slow tutorial?
Amir Youssefi 2008-06-26, 21:06

 I created latest pig.jar, tested defaults/pig.properties with PIG-235.

 Local mode is still running after half an hour and may not finish in
hours.

 3 nodes on Hadoop/mapreduce mode ran in less than 10 min (similar to
old runs we had).

Amir
-----Original Message-----
From: Amir Youssefi [mailto:[EMAIL PROTECTED]]
Sent: Thursday, June 26, 2008 12:30 PM
To: [EMAIL PROTECTED]
Subject: RE: Slow tutorial?

Hi Mark,

 pig.jar that comes with it is old and doesn't have pig.properties.

 Try making a new build (June 26th or later) and make sure you have
these in pig.properties:

#Do not spill temp files smaller than this size (bytes)
pig.spill.size.threshold=5000000
#EXPERIMENT: Activate garbage collection when spilling a file bigger
than this size (bytes) #This should help reduce the number of files
being spilled.
pig.spill.gc.activation.size=40000000

or similar numbers...

Amir

-----Original Message-----
From: Mark Snow [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 25, 2008 8:07 PM
To: [EMAIL PROTECTED]
Subject: Slow tutorial?

Hi All,

I downloaded the pig tutorial to give it a whirl, set it up on a hadoop
cluster I've used for a few other tasks (7 nodes, ec2) and went through
the instructions to launch tutorial script1 with the excite bz file on
hdfs. Two things jumped out:

1) Only one mapper launched
2) It's really slow. It's been almost 5 hours and still under 10% of the
mapper is completed

Have I misconfigured something? What's a good benchmark run time for the
tutorial scripts to complete?

      
+
Amir Youssefi 2008-06-26, 21:12