Panshul Whisper 2013-01-15, 03:04
bejoy.hadoop@... 2013-01-15, 03:11
Panshul Whisper 2013-01-15, 03:31
Harsh J 2013-01-15, 04:36
anil gupta 2013-01-15, 04:35
nagarjuna kanamarlapudi 2013-01-15, 03:50
Harsh J 2013-01-16, 04:14
I feel the most reliable approach is using NN-HA features with shared storage. Here the idea is having two Namenodes. Both the Active, Standby(Secondary) Namenodes will be pointing to the shared device and writes the editlogs to it. When the Active crashes, Standby will take over and become Active and continue serving the clients reliably without much interruptions.
One of the possible approach is with BookKeeper as Shared storage device:
From: Harsh J [[EMAIL PROTECTED]]
Sent: Wednesday, January 16, 2013 9:44 AM
To: <[EMAIL PROTECTED]>
Subject: Re: hadoop namenode recovery
The NFS mount is to be soft-mounted; so if the NFS goes down, the NN ejects it out and continues with the local disk. If auto-restore is configured, it will re-add the NFS if its detected good again later.
On Wed, Jan 16, 2013 at 7:04 AM, randy <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
What happens to the NN and/or performance if there's a problem with the NFS server? Or the network?
On 01/14/2013 11:36 PM, Harsh J wrote:
Its very rare to observe an NN crash due to a software bug in
production. Most of the times its a hardware fault you should worry about.
On 1.x, or any non-HA-carrying release, the best you can get to
safeguard against a total loss is to have redundant disk volumes
configured, one preferably over a dedicated remote NFS mount. This way
the NN is recoverable after the node goes down, since you can retrieve a
current copy from another machine (i.e. via the NFS mount) and set a new
node up to replace the older NN and continue along.
A load balancer will not work as the NN is not a simple webserver - it
maintains state which you cannot sync. We wrote HA-HDFS features to
address the very concern you have.
If you want true, painless HA, branch-2 is your best bet at this point.
An upcoming 2.0.3 release should include the QJM based HA features that
is painless to setup and very reliable to use (over other options), and
works with commodity level hardware. FWIW, we've (my team and I) been
supporting several users and customers who're running the 2.x based HA
in production and other types of environments and it has been greatly
stable in our experience. There are also some folks in the community
running 2.x based HDFS for HA/else.
On Tue, Jan 15, 2013 at 6:55 AM, Panshul Whisper <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
<mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>> wrote:
Is there a standard way to prevent the failure of Namenode crash in
a Hadoop cluster?
or what is the standard or best practice for overcoming the Single
point failure problem of Hadoop.
I am not ready to take chances on a production server with Hadoop
2.0 Alpha release, which claims to have solved the problem. Are
there any other things I can do to either prevent the failure or
recover from the failure in a very short time.
Michel Segel 2013-01-17, 10:26
Harsh J 2013-01-17, 13:38