I wanted to elaborate on what happened.
A hadoop slave was added to a live cluster. Turns out, I think the
mapred-site.xml was not configured with the correct master host. But
alas, in any case when the commands were run:
* |$ hadoop mradmin -refreshNodes|
* |$ hadoop dfsadmin -refreshNodes|
The master went completely berserk, up to a system load of 60 where it
This should never, ever happen -- no matter what the issue. So what
I'm trying to understand is how to prevent this while allowing
hadoop/java to run about its business.
We are using an older version of Hadoop (1.0.1) so maybe we hit a bug, I
can't really tell.
I read an article about Spotify experiencing issues like this and some
of their approaches, but it's not clear which is which here (I'm a newbie).
On 9/16/13 5:04 PM, Vinod Kumar Vavilapalli wrote:
> I assume you are on Linux. Also assuming that your tasks are so
> resource intensive that they are taking down nodes. You should enable
> limits per task, see
> What it does is that jobs are now forced to up front provide their
> resource requirements, and TTs enforce those limits.
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:
>> We recently experienced a couple of situations that brought one or
>> more Hadoop nodes down (unresponsive). One was related to a bug in
>> a utility we use (ffmpeg) that was resolved by compiling a new
>> version. The next, today, occurred after attempting to join a new
>> node to the cluster.
>> A basic start of the (local) tasktracker and datanode did not work --
>> so based on reference, I issued: hadoop mradmin -refreshNodes, which
>> was to be followed by hadoop dfsadmin -refreshNodes. The load
>> average literally jumped to 60 and the master (which also runs a
>> slave) became unresponsive.
>> Seems to me that this should never happen. But, looking around, I
>> saw an article from Spotify which mentioned the need to set certain
>> resource limits on the JVM as well as in the system itself
>> (limits.conf, we run RHEL). I (and we) are fairly new to Hadoop,
>> so some of these issues are very new.
>> I wonder if some of the experts here might be able to comment on this
>> issue - perhaps point out settings and other measures we can take to
>> prevent this sort of incident in the future.
>> Our setup is not complicated. Have 3 hadoop nodes, the first is
>> also a master and a slave (has more resources, too). The underlying
>> system we do is split up tasks to ffmpeg (which is another issue as
>> it tends to eat resources, but so far with a recompile, we are
>> good). We have two more hardware nodes to add shortly.
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
> entity to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is
> strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system.
> Thank You.