Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> How can I set the mapper number for pig script?


+
Sheng Guo 2012-06-23, 02:27
+
Jagat Singh 2012-06-23, 04:31
+
Sheng Guo 2012-06-23, 07:30
Copy link to this message
-
Re: How can I set the mapper number for pig script?
On Sat, Jun 23, 2012 at 3:30 AM, Sheng Guo <[EMAIL PROTECTED]> wrote:
> I know it is automatically set. But I have a large data set, I want it
> allocate more mappers during midnight so that more computing resource could
> be used to speed up.
> Any suggestions?

Pig uses CombineInputFormat by default which attempts to combine a set
of physical input splits into one logical input split.
I use the following setting to control the number of mappers in some
of my benchmarking scripts:

-- combine upto this many bytes into a composite input split, i.e., per mapper
SET pig.maxCombinedSplitSize 250000000;

Note that your are absolute min. is constrained by the smallest block
size in your input set.
+
Scott Foster 2012-06-23, 16:40
+
Sheng Guo 2012-06-23, 20:48
+
Yang 2012-06-23, 21:58
+
John Meagher 2012-06-23, 23:15
+
Scott Foster 2012-06-26, 23:47
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB