Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # dev - Does abrupt kill corrupts the datadir?


Copy link to this message
-
RE: FW: Does abrupt kill corrupts the datadir?
Laxman 2011-07-28, 07:50
Thanks for the responses Mahadev, Pat and Ben.
I understand your explanation.

My only question is "Will there be any probability data loss in the scenario
mentioned?"

>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted
there is a chance of data loss.

>>if we use sigterm in the script, we would want to put a timeout in to
escalate to a -9

As Ben mentioned, even if we escalate to "kill -9" to ensure shutdown, still
we may have data loss. But the probability is very less by giving a chance
to shutdown gracefully.

Please do correct me if my understanding is wrong.
--
Laxman

-----Original Message-----
From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
Sent: Thursday, July 28, 2011 11:40 AM
To: [EMAIL PROTECTED]
Subject: Re: FW: Does abrupt kill corrupts the datadir?

i agree with pat. if we use sigterm in the script, we would want to
put a timeout in to escalate to a -9 which makes the script a bit more
complicated without reason since we don't have any exit hooks that we
want to run. zookeeper is designed to recover well from hard failures,
much worse than a kill -9. i don't think we want to change that.

ben

On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> ZK has been built around the "fail fast" approach. In order to
> maintain high availability we want to ensure that restarting a server
> will result in it attempting to rejoin the quorum. IMO we would not
> want to change this (kill -9).
>
> Patrick
>
> On Tue, Jul 26, 2011 at 2:02 AM, Laxman <[EMAIL PROTECTED]> wrote:
>> Hi Everyone,
>>
>> Any thoughts?
>> Do we need consider changing abrupt shutdown to
>>
>> Implementations in some other hadoop eco system projects for your
reference.
>> Hadoop - kill [SIGTERM]
>> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung
>> ZooKeeper - "kill -9" [SIGKILL]
>>
>>
>> -----Original Message-----
>> From: Laxman [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, July 13, 2011 12:36 PM
>> To: '[EMAIL PROTECTED]'
>> Subject: RE: Does abrupt kill corrupts the datadir?
>>
>> Hi Mahadev,
>>
>> Shutdown hook is just a quick thought. Another approach can be just give
a
>> kill [SIGTERM] call which can be interpreted by process.
>>
>> First look at the "kill -9" triggered the following scenario.
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted
there
>>>is a chance of dataloss.
>>
>> How does zookeeper can deal with this scenario gracefully?
>>
>> Also, I feel we should give a chance to application to shutdown
gracefully
>> before abrupt shutdown.
>>
>> http://en.wikipedia.org/wiki/SIGKILL
>>
>> Because SIGKILL gives the process no opportunity to do cleanup operations
on
>> terminating, in most system shutdown procedures an attempt is first made
to
>> terminate processes using SIGTERM, before resorting to SIGKILL.
>>
>> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/
>>
>> The application can determine what it wants to do once a SIGTERM is
>> received. While most applications will clean up their resources and stop,
>> some may not. An application may be configured to do something completely
>> different when a SIGTERM is received. Also, if the application is in a
bad
>> state, such as waiting for disk I/O, it may not be able to act on the
signal
>> that was sent.
>>
>> Most system administrators will usually resort to the more abrupt signal
>> when an application doesn't respond to a SIGTERM.
>>
>> -----Original Message-----
>> From: Mahadev Konar [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, July 13, 2011 12:02 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Does abrupt kill corrupts the datadir?
>>
>> Hi Laxman,
>>  The servers takes care of all the issues with data integrity, so a kill
>> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure
>> everything works reliably is use kill -9 :).
>>
>> Thanks
>> mahadev
>>
>> On 7/12/11 11:16 PM, "Laxman" <[EMAIL PROTECTED]> wrote:
>>
>>>When we stop zookeeper through zkServer.sh stop, we are aborting the
there