-Re: Resource limits with Hadoop and JVM
Forrest Aldrich 2013-09-16, 22:09
Yes, I mentioned below we're running RHEL.
In this case, when I went to add the node, I ran "hadoop mradmin
-refreshNodes" (as user hadoop) and the master node went completely nuts
- the system load jumped to 60 ("top" was frozen on the console) and
required a hard reboot.
Whether or not the slave node I added had errors in the *.xml, this
should never happen. At least, I would like it if it never happened
java version "1.6.0_39"
Java(TM) SE Runtime Environment (build 1.6.0_39-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
Perhaps we ran into a bug? I know we need to upgrade, but we're being
very cautious about changes to the production environment. If it works,
don't fix it type of approach.
On 9/16/13 5:04 PM, Vinod Kumar Vavilapalli wrote:
> I assume you are on Linux. Also assuming that your tasks are so
> resource intensive that they are taking down nodes. You should enable
> limits per task, see
> What it does is that jobs are now forced to up front provide their
> resource requirements, and TTs enforce those limits.
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:
>> We recently experienced a couple of situations that brought one or
>> more Hadoop nodes down (unresponsive). One was related to a bug in
>> a utility we use (ffmpeg) that was resolved by compiling a new
>> version. The next, today, occurred after attempting to join a new
>> node to the cluster.
>> A basic start of the (local) tasktracker and datanode did not work --
>> so based on reference, I issued: hadoop mradmin -refreshNodes, which
>> was to be followed by hadoop dfsadmin -refreshNodes. The load
>> average literally jumped to 60 and the master (which also runs a
>> slave) became unresponsive.
>> Seems to me that this should never happen. But, looking around, I
>> saw an article from Spotify which mentioned the need to set certain
>> resource limits on the JVM as well as in the system itself
>> (limits.conf, we run RHEL). I (and we) are fairly new to Hadoop,
>> so some of these issues are very new.
>> I wonder if some of the experts here might be able to comment on this
>> issue - perhaps point out settings and other measures we can take to
>> prevent this sort of incident in the future.
>> Our setup is not complicated. Have 3 hadoop nodes, the first is
>> also a master and a slave (has more resources, too). The underlying
>> system we do is split up tasks to ffmpeg (which is another issue as
>> it tends to eat resources, but so far with a recompile, we are
>> good). We have two more hardware nodes to add shortly.
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
> entity to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is
> strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system.
> Thank You.