Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - possibly Pig throttles the number of mappers


Copy link to this message
-
Re: possibly Pig throttles the number of mappers
Dexin Wang 2011-03-24, 00:58
Thanks Alan!

We are using 0.79. Also got an answer from #hadoop channel and with this
quora answer:

http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency

<http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency>We
will look into combining more work in each mapper and/or use Pig 0.8.

Thanks again for your help.

Dexin

On Wed, Mar 23, 2011 at 5:55 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> What version of Pig are you using?  Starting in 0.8 Pig will combine small
> blocks into a single map.  This prevents jobs that actually are reading
> small amounts of data from taking a lot of slots on the cluster.  You can
> turn this off by adding -Dpig.noSplitCombination=true to your command line.
>
> Alan.
>
>
> On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote:
>
>  And the nodes are pretty lightly loaded (~1.0) and there's plenty of free
>> memory. Now I'm seeing 2 mappers per node. Very much under-utilized.
>>
>> On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
>>
>>  Hi,
>>>
>>> We've seen a strange problem where some Pig jobs would just run fewer
>>> mappers concurrently than the mapper capacity. Specifically we have a 10
>>> node cluster and each is configured to have 12 mappers. Normally we have
>>> 120
>>> mappers running. But for some Pig jobs it will only have 10 mappers
>>> running
>>> (while nothing else is running), and actually appears to be 1 mapper per
>>> node.
>>>
>>> We have not noticed the same problem with other non-Pig hadoop job.
>>> Anyone
>>> has experienced the same thing and have any explanation or remedy?
>>>
>>> Thanks!
>>> Dexin
>>>
>>>
>