Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Resource limits with Hadoop and JVM


+
Forrest Aldrich 2013-09-16, 20:35
+
Vinod Kumar Vavilapalli 2013-09-16, 21:04
+
Forrest Aldrich 2013-09-16, 22:09
Copy link to this message
-
Re: Resource limits with Hadoop and JVM
Forrest Aldrich 2013-09-28, 04:48
I wanted to elaborate on what happened.

A hadoop slave was added to a live cluster.   Turns out, I think the
mapred-site.xml was not configured with the correct master host.  But
alas, in any case when the commands were run:
  * |$ hadoop mradmin -refreshNodes|
  * |$ hadoop dfsadmin -refreshNodes|

||

The master went completely berserk, up to a system load of 60 where it
froze.

This should never, ever happen -- no matter what the issue.   So what
I'm trying to understand is how to prevent this while allowing
hadoop/java to run about its business.

We are using an older version of Hadoop (1.0.1) so maybe we hit a bug, I
can't really tell.

I read an article about Spotify experiencing issues like this and some
of their approaches, but it's not clear which is which here (I'm a newbie).
Thanks.

On 9/16/13 5:04 PM, Vinod Kumar Vavilapalli wrote:
> I assume you are on Linux. Also assuming that your tasks are so
> resource intensive that they are taking down nodes. You should enable
> limits per task, see
> http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring
>
> What it does is that jobs are now forced to up front provide their
> resource requirements, and TTs enforce those limits.
>
> HTH
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:
>
>> We recently experienced a couple of situations that brought one or
>> more Hadoop nodes down (unresponsive).   One was related to a bug in
>> a utility we use (ffmpeg) that was resolved by compiling a new
>> version. The next, today, occurred after attempting to join a new
>> node to the cluster.
>>
>> A basic start of the (local) tasktracker and datanode did not work --
>> so based on reference, I issued: hadoop mradmin -refreshNodes, which
>> was to be followed by hadoop dfsadmin -refreshNodes.    The load
>> average literally jumped to 60 and the master (which also runs a
>> slave) became unresponsive.
>>
>> Seems to me that this should never happen.   But, looking around, I
>> saw an article from Spotify which mentioned the need to set certain
>> resource limits on the JVM as well as in the system itself
>> (limits.conf, we run RHEL).    I (and we) are fairly new to Hadoop,
>> so some of these issues are very new.
>>
>> I wonder if some of the experts here might be able to comment on this
>> issue - perhaps point out settings and other measures we can take to
>> prevent this sort of incident in the future.
>>
>> Our setup is not complicated.   Have 3 hadoop nodes, the first is
>> also a master and a slave (has more resources, too).   The underlying
>> system we do is split up tasks to ffmpeg  (which is another issue as
>> it tends to eat resources, but so far with a recompile, we are
>> good).   We have two more hardware nodes to add shortly.
>>
>>
>> Thanks!
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
> entity to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is
> strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system.
> Thank You.