Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> possibly Pig throttles the number of mappers


Copy link to this message
-
Re: possibly Pig throttles the number of mappers
Thanks Alan!

We are using 0.79. Also got an answer from #hadoop channel and with this
quora answer:

http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency

<http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency>We
will look into combining more work in each mapper and/or use Pig 0.8.

Thanks again for your help.

Dexin

On Wed, Mar 23, 2011 at 5:55 PM, Alan Gates <[EMAIL PROTECTED]> wrote:

> What version of Pig are you using?  Starting in 0.8 Pig will combine small
> blocks into a single map.  This prevents jobs that actually are reading
> small amounts of data from taking a lot of slots on the cluster.  You can
> turn this off by adding -Dpig.noSplitCombination=true to your command line.
>
> Alan.
>
>
> On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote:
>
>  And the nodes are pretty lightly loaded (~1.0) and there's plenty of free
>> memory. Now I'm seeing 2 mappers per node. Very much under-utilized.
>>
>> On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
>>
>>  Hi,
>>>
>>> We've seen a strange problem where some Pig jobs would just run fewer
>>> mappers concurrently than the mapper capacity. Specifically we have a 10
>>> node cluster and each is configured to have 12 mappers. Normally we have
>>> 120
>>> mappers running. But for some Pig jobs it will only have 10 mappers
>>> running
>>> (while nothing else is running), and actually appears to be 1 mapper per
>>> node.
>>>
>>> We have not noticed the same problem with other non-Pig hadoop job.
>>> Anyone
>>> has experienced the same thing and have any explanation or remedy?
>>>
>>> Thanks!
>>> Dexin
>>>
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB