Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Some jobs seem to run forever


+
Steve Lewis 2013-08-29, 02:47
+
Charles Baker 2013-08-29, 16:29
Copy link to this message
-
Re: Some jobs seem to run forever
As I said in the original message bad partitioning was my original theory I
have had issues with it in the past and am careful with my partitioner. It
 was the first thing I looked for but I do not see any evidence that the
slower jobs have significantly more data than the faster ones and certainly
not enough to justify a radically different running time.
On Thu, Aug 29, 2013 at 9:29 AM, Charles Baker <[EMAIL PROTECTED]> wrote:

>  Hi Steve. Sounds like a classic case of uneven data distribution among
> the reducers. Most of your data is probably going to those 10 reducers that
> are taking many hours. You may want to adjust your key and/or partitioning
> strategy to better distribute the data amongst the reducers. If you’re
> using a hashing type of partitioning strategy, think about using a prime
> number of reducers. Primes are proven to have a more even distribution with
> a hash type strategy and this alone may get you pretty far. I have no idea
> what your workflow or cluster configuration is like but 300 reducers for
> 300 mappers doesn’t sound right. Try using a (prime) number of reducers
> that’s roughly  equal to 95% of the total reducer slots allocated on the
> cluster and go from there. Usually, the cluster should be configured for
> less reducers than mappers. If you have 12 cores per node (HT off), try 8
> mappers and 3 reducers per node.****
>
> ** **
>
> Good luck!****
>
> ** **
>
> Chuck****
>
> ** **
>
> ** **
>
> *From:* Steve Lewis [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, August 28, 2013 7:48 PM
> *To:* mapreduce-user
> *Subject:* Some jobs seem to run forever****
>
> ** **
>
> I have an issue that I am running a hadoop job on a 40 node cluster with
> about 300 Map tasks and about 300 reduce tasks. Most tasks complete within
> 20 minutes but a few, typically less than 10 run for many hours. ****
>
> If they complete I see nothing to suggest that the number of bytes read or
> written or the number of records read or written is significantly different
> from tasks that run much faster. I sometimes see multiple attempts -
> usually only two and the cluster is doing nothing else.****
>
> ** **
>
> Any suggested tuning?
> ****
>
> ** **
>
>  ****
>
>
>
> www.sdl.com
> <http://www.sdl.com/?utm_source=Email&utm_medium=Email%2BSignature&utm_campaign=SDL%2BStandard%2BEmail%2BSignature>
>
>  *SDL PLC confidential, all rights reserved.* If you are not the intended
> recipient of this mail SDL requests and requires that you delete it without
> acting upon or copying any of its contents, and we further request that you
> advise us.
>
> SDL Enterprise Technologies, Inc. - all rights reserved. The information
> contained in this email may be confidential and/or legally privileged. It
> has been sent for the sole use of the intended recipient(s). If you are not
> the intended recipient of this mail, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution, or
> copying of this communication, or any of its contents, is strictly
> prohibited. If you have received this communication in error, please reply
> to the sender and destroy all copies of the message.
> Registered address: 201 Edgewater Drive, Suite 225, Wakefield, MA 01880,
> USA
>

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB