Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - How can I set the mapper number for pig script?


+
Sheng Guo 2012-06-23, 02:27
+
Jagat Singh 2012-06-23, 04:31
+
Sheng Guo 2012-06-23, 07:30
Copy link to this message
-
Re: How can I set the mapper number for pig script?
Stan Rosenberg 2012-06-23, 15:13
On Sat, Jun 23, 2012 at 3:30 AM, Sheng Guo <[EMAIL PROTECTED]> wrote:
> I know it is automatically set. But I have a large data set, I want it
> allocate more mappers during midnight so that more computing resource could
> be used to speed up.
> Any suggestions?

Pig uses CombineInputFormat by default which attempts to combine a set
of physical input splits into one logical input split.
I use the following setting to control the number of mappers in some
of my benchmarking scripts:

-- combine upto this many bytes into a composite input split, i.e., per mapper
SET pig.maxCombinedSplitSize 250000000;

Note that your are absolute min. is constrained by the smallest block
size in your input set.
+
Scott Foster 2012-06-23, 16:40
+
Sheng Guo 2012-06-23, 20:48
+
Yang 2012-06-23, 21:58
+
John Meagher 2012-06-23, 23:15
+
Scott Foster 2012-06-26, 23:47