Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How can I set the mapper number for pig script?


Copy link to this message
-
Re: How can I set the mapper number for pig script?
On Sat, Jun 23, 2012 at 3:30 AM, Sheng Guo <[EMAIL PROTECTED]> wrote:
> I know it is automatically set. But I have a large data set, I want it
> allocate more mappers during midnight so that more computing resource could
> be used to speed up.
> Any suggestions?

Pig uses CombineInputFormat by default which attempts to combine a set
of physical input splits into one logical input split.
I use the following setting to control the number of mappers in some
of my benchmarking scripts:

-- combine upto this many bytes into a composite input split, i.e., per mapper
SET pig.maxCombinedSplitSize 250000000;

Note that your are absolute min. is constrained by the smallest block
size in your input set.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB