Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - dynamically resizing the Hadoop cluster?


Copy link to this message
-
Re: dynamically resizing the Hadoop cluster?
Nan Zhu 2013-10-24, 21:07
Good explanation,

Thank you, Ravi

Best,
On Thu, Oct 24, 2013 at 4:51 PM, Ravi Prakash <[EMAIL PROTECTED]> wrote:

> Hi Nan!
>
> If the task trackers stop heartbeating back to the JobTracker, the
> JobTracker will mark them as dead and reschedule the tasks which were
> running on that TaskTracker. Admittedly there is some delay between when
> the TaskTrackers stop heartbeating back and when the JobTracker marks them
> dead. This is controlled by the mapred.tasktracker.expiry.intervalparameter (I'm assuming you are using Hadoop 1.x)
>
> HTH
> Ravi
>
>
>
>
>
>
>   On Thursday, October 24, 2013 1:21 PM, Nan Zhu <[EMAIL PROTECTED]>
> wrote:
>  Hi, Ravi,
>
> Thank you for the reply
>
> Actually I'm not running HDFS on EC2, instead I use S3 to store data
>
> I'm curious about that, if some nodes are decommissioned, the JobTracker
> will deal those tasks which originally ran on them as "too slow" (since no
> progress for a long time) so to run speculative execution OR it directly
> treats them as "belonging to a running Job and ran on a dead TaskTracker"?
>
> Best,
>
> Nan
>
>
>
>
>
>
> On Thu, Oct 24, 2013 at 2:04 PM, Ravi Prakash <[EMAIL PROTECTED]> wrote:
>
> Hi Nan!
>
> Usually nodes are decommissioned slowly over some period of time so as not
> to disrupt the running jobs. When a node is decommissioned, the NameNode
> must re-replicate all under-replicated blocks. Rather than suddenly remove
> half the nodes, you might want to take a few nodes offline at a time.
> Hadoop should be able to handle rescheduling tasks on nodes no longer
> available (even without speculative execution. Speculative execution is for
> something else).
>
> HTH
> Ravi
>
>
>   On Wednesday, October 23, 2013 10:26 PM, Nan Zhu <[EMAIL PROTECTED]>
> wrote:
>   Hi, all
>
> I’m running a Hadoop cluster on AWS EC2,
>
> I would like to dynamically resizing the cluster so as to reduce the cost,
> is there any solution to achieve this?
>
> E.g. I would like to cut the cluster size with a half, is it safe to just
> shutdown the instances (if some tasks are just running on them, can I rely
> on the speculative execution to re-run them on the other nodes?)
>
> I cannot use EMR, since I’m running a customized version of Hadoop
>
> Best,
>
> --
> Nan Zhu
> School of Computer Science,
> McGill University
>
>
>
>
>
>
>
> --
> Nan Zhu
> School of Computer Science,
> McGill University
> E-Mail: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
>
>
>
--
Nan Zhu
School of Computer Science,
McGill University
E-Mail: [EMAIL PROTECTED] <[EMAIL PROTECTED]>