Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> possibly Pig throttles the number of mappers

Copy link to this message
Re: possibly Pig throttles the number of mappers
Thanks Alan!

We are using 0.79. Also got an answer from #hadoop channel and with this
quora answer:


will look into combining more work in each mapper and/or use Pig 0.8.

Thanks again for your help.


On Wed, Mar 23, 2011 at 5:55 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> What version of Pig are you using?  Starting in 0.8 Pig will combine small
> blocks into a single map.  This prevents jobs that actually are reading
> small amounts of data from taking a lot of slots on the cluster.  You can
> turn this off by adding -Dpig.noSplitCombination=true to your command line.
> Alan.
> On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote:
>  And the nodes are pretty lightly loaded (~1.0) and there's plenty of free
>> memory. Now I'm seeing 2 mappers per node. Very much under-utilized.
>> On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
>>  Hi,
>>> We've seen a strange problem where some Pig jobs would just run fewer
>>> mappers concurrently than the mapper capacity. Specifically we have a 10
>>> node cluster and each is configured to have 12 mappers. Normally we have
>>> 120
>>> mappers running. But for some Pig jobs it will only have 10 mappers
>>> running
>>> (while nothing else is running), and actually appears to be 1 mapper per
>>> node.
>>> We have not noticed the same problem with other non-Pig hadoop job.
>>> Anyone
>>> has experienced the same thing and have any explanation or remedy?
>>> Thanks!
>>> Dexin