Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Resource limits with Hadoop and JVM


Copy link to this message
-
Re: Resource limits with Hadoop and JVM
Forrest Aldrich 2013-09-16, 22:09
Yes, I mentioned below we're running RHEL.

In this case, when I went to add the node, I ran "hadoop mradmin
-refreshNodes" (as user hadoop) and the master node went completely nuts
- the system load jumped to 60 ("top" was frozen on the console) and
required a hard reboot.

Whether or not the slave node I added had errors in the *.xml, this
should never happen.  At least, I would like it if it never happened
again ;-)

We're running:

java version "1.6.0_39"
Java(TM) SE Runtime Environment (build 1.6.0_39-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

Hadoop v1.0.1

Perhaps we ran into a bug?   I know we need to upgrade, but we're being
very cautious about changes to the production environment.  If it works,
don't fix it type of approach.

Thanks,

Forrest

On 9/16/13 5:04 PM, Vinod Kumar Vavilapalli wrote:
> I assume you are on Linux. Also assuming that your tasks are so
> resource intensive that they are taking down nodes. You should enable
> limits per task, see
> http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring
>
> What it does is that jobs are now forced to up front provide their
> resource requirements, and TTs enforce those limits.
>
> HTH
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:
>
>> We recently experienced a couple of situations that brought one or
>> more Hadoop nodes down (unresponsive).   One was related to a bug in
>> a utility we use (ffmpeg) that was resolved by compiling a new
>> version. The next, today, occurred after attempting to join a new
>> node to the cluster.
>>
>> A basic start of the (local) tasktracker and datanode did not work --
>> so based on reference, I issued: hadoop mradmin -refreshNodes, which
>> was to be followed by hadoop dfsadmin -refreshNodes.    The load
>> average literally jumped to 60 and the master (which also runs a
>> slave) became unresponsive.
>>
>> Seems to me that this should never happen.   But, looking around, I
>> saw an article from Spotify which mentioned the need to set certain
>> resource limits on the JVM as well as in the system itself
>> (limits.conf, we run RHEL).    I (and we) are fairly new to Hadoop,
>> so some of these issues are very new.
>>
>> I wonder if some of the experts here might be able to comment on this
>> issue - perhaps point out settings and other measures we can take to
>> prevent this sort of incident in the future.
>>
>> Our setup is not complicated.   Have 3 hadoop nodes, the first is
>> also a master and a slave (has more resources, too).   The underlying
>> system we do is split up tasks to ffmpeg  (which is another issue as
>> it tends to eat resources, but so far with a recompile, we are
>> good).   We have two more hardware nodes to add shortly.
>>
>>
>> Thanks!
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
> entity to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is
> strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system.
> Thank You.