Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - HDFS HA IO Fencing


Copy link to this message
-
Re: HDFS HA IO Fencing
Balaji Narayanan 2012-10-27, 17:34
If you use NSFv4 you should be able to use locks and when a machine dies /
fails to renew the lease, the other machine can take over.

On Friday, October 26, 2012, Todd Lipcon wrote:

> NFS Locks typically last forever if you disconnect abruptly. So they are
> not sufficient -- your standby wouldn't be able to take over without manual
> intervention to remove the lock.
>
> If you want to build an unreliable system that might corrupt your data,
> you could set up 'shell(/bin/true)' as a second fencing method. But, it's
> really a bad idea. There are failure scenarios which could cause split
> brain if you do this, and you'd very likely lose data.
>
> -Todd
>
> On Fri, Oct 26, 2012 at 1:59 AM, lei liu <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>
> > wrote:
>
>> We are using NFS for Shared storage,  Can we use linux nfslcok service to
>> implement IO Fencing ?
>>
>>
>> 2012/10/26 Steve Loughran <[EMAIL PROTECTED] <javascript:_e({},
>> 'cvml', '[EMAIL PROTECTED]');>>
>>
>>>
>>>
>>> On 25 October 2012 14:08, Todd Lipcon <[EMAIL PROTECTED]<javascript:_e({}, 'cvml', '[EMAIL PROTECTED]');>
>>> > wrote:
>>>
>>>> Hi Liu,
>>>>
>>>> Locks are not sufficient, because there is no way to enforce a lock in
>>>> a distributed system without unbounded blocking. What you might be
>>>> referring to is a lease, but leases are still problematic unless you can
>>>> put bounds on the speed with which clocks progress on different machines,
>>>> _and_ have strict guarantees on the way each node's scheduler works. With
>>>> Linux and Java, the latter is tough.
>>>>
>>>>
>>> on any OS running in any virtual environment, including EC2, time is
>>> entirely unpredictable, just to make things worse.
>>>
>>>
>>> On a single machine you can use file locking as the OS will know that
>>> the process is dead and closes the file; other programs can attempt to open
>>> the same file with exclusive locking -and, by getting the right failures,
>>> know that something else has the file, hence the other process is live.
>>> Shared NFS storage you need to mount with softlock set precisely to stop
>>> file locks lasting until some lease has expired, because the on-host
>>> liveness probes detect failure faster and want to react to it.
>>>
>>>
>>> -Steve
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
--
Thanks
-balaji

--
http://balajin.net/blog/
http://flic.kr/balajijegan