Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: hadoop namenode recovery


+
Panshul Whisper 2013-01-15, 03:04
+
bejoy.hadoop@... 2013-01-15, 03:11
+
Panshul Whisper 2013-01-15, 03:31
Copy link to this message
-
Re: hadoop namenode recovery
Its very rare to observe an NN crash due to a software bug in production.
Most of the times its a hardware fault you should worry about.

On 1.x, or any non-HA-carrying release, the best you can get to safeguard
against a total loss is to have redundant disk volumes configured, one
preferably over a dedicated remote NFS mount. This way the NN is
recoverable after the node goes down, since you can retrieve a current copy
from another machine (i.e. via the NFS mount) and set a new node up to
replace the older NN and continue along.

A load balancer will not work as the NN is not a simple webserver - it
maintains state which you cannot sync. We wrote HA-HDFS features to address
the very concern you have.

If you want true, painless HA, branch-2 is your best bet at this point. An
upcoming 2.0.3 release should include the QJM based HA features that is
painless to setup and very reliable to use (over other options), and works
with commodity level hardware. FWIW, we've (my team and I) been supporting
several users and customers who're running the 2.x based HA in production
and other types of environments and it has been greatly stable in our
experience. There are also some folks in the community running 2.x based
HDFS for HA/else.
On Tue, Jan 15, 2013 at 6:55 AM, Panshul Whisper <[EMAIL PROTECTED]>wrote:

> Hello,
>
> Is there a standard way to prevent the failure of Namenode crash in a
> Hadoop cluster?
> or what is the standard or best practice for overcoming the Single point
> failure problem of Hadoop.
>
> I am not ready to take chances on a production server with Hadoop 2.0
> Alpha release, which claims to have solved the problem. Are there any other
> things I can do to either prevent the failure or recover from the failure
> in a very short time.
>
> Thanking You,
>
> --
> Regards,
> Ouch Whisper
> 010101010101
>

--
Harsh J
+
anil gupta 2013-01-15, 04:35
+
nagarjuna kanamarlapudi 2013-01-15, 03:50
+
Harsh J 2013-01-16, 04:14
+
Rakesh R 2013-01-16, 04:47
+
Michel Segel 2013-01-17, 10:26
+
Harsh J 2013-01-17, 13:38