Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> Does abrupt kill corrupts the datadir?


Copy link to this message
-
Re: FW: Does abrupt kill corrupts the datadir?
i agree with pat. if we use sigterm in the script, we would want to
put a timeout in to escalate to a -9 which makes the script a bit more
complicated without reason since we don't have any exit hooks that we
want to run. zookeeper is designed to recover well from hard failures,
much worse than a kill -9. i don't think we want to change that.

ben

On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> ZK has been built around the "fail fast" approach. In order to
> maintain high availability we want to ensure that restarting a server
> will result in it attempting to rejoin the quorum. IMO we would not
> want to change this (kill -9).
>
> Patrick
>
> On Tue, Jul 26, 2011 at 2:02 AM, Laxman <[EMAIL PROTECTED]> wrote:
>> Hi Everyone,
>>
>> Any thoughts?
>> Do we need consider changing abrupt shutdown to
>>
>> Implementations in some other hadoop eco system projects for your reference.
>> Hadoop - kill [SIGTERM]
>> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung
>> ZooKeeper - "kill -9" [SIGKILL]
>>
>>
>> -----Original Message-----
>> From: Laxman [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, July 13, 2011 12:36 PM
>> To: '[EMAIL PROTECTED]'
>> Subject: RE: Does abrupt kill corrupts the datadir?
>>
>> Hi Mahadev,
>>
>> Shutdown hook is just a quick thought. Another approach can be just give a
>> kill [SIGTERM] call which can be interpreted by process.
>>
>> First look at the "kill -9" triggered the following scenario.
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>>is a chance of dataloss.
>>
>> How does zookeeper can deal with this scenario gracefully?
>>
>> Also, I feel we should give a chance to application to shutdown gracefully
>> before abrupt shutdown.
>>
>> http://en.wikipedia.org/wiki/SIGKILL
>>
>> Because SIGKILL gives the process no opportunity to do cleanup operations on
>> terminating, in most system shutdown procedures an attempt is first made to
>> terminate processes using SIGTERM, before resorting to SIGKILL.
>>
>> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/
>>
>> The application can determine what it wants to do once a SIGTERM is
>> received. While most applications will clean up their resources and stop,
>> some may not. An application may be configured to do something completely
>> different when a SIGTERM is received. Also, if the application is in a bad
>> state, such as waiting for disk I/O, it may not be able to act on the signal
>> that was sent.
>>
>> Most system administrators will usually resort to the more abrupt signal
>> when an application doesn't respond to a SIGTERM.
>>
>> -----Original Message-----
>> From: Mahadev Konar [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, July 13, 2011 12:02 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Does abrupt kill corrupts the datadir?
>>
>> Hi Laxman,
>>  The servers takes care of all the issues with data integrity, so a kill
>> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure
>> everything works reliably is use kill -9 :).
>>
>> Thanks
>> mahadev
>>
>> On 7/12/11 11:16 PM, "Laxman" <[EMAIL PROTECTED]> wrote:
>>
>>>When we stop zookeeper through zkServer.sh stop, we are aborting the
>>>zookeeper process using "kill -9".
>>>
>>>
>>>
>>>129 stop)
>>>
>>>130     echo -n "Stopping zookeeper ... "
>>>
>>>131     if [ ! -f "$ZOOPIDFILE" ]
>>>
>>>132     then
>>>
>>>133       echo "error: could not find file $ZOOPIDFILE"
>>>
>>>134       exit 1
>>>
>>>135     else
>>>
>>>136       $KILL -9 $(cat "$ZOOPIDFILE")
>>>
>>>137       rm "$ZOOPIDFILE"
>>>
>>>138       echo STOPPED
>>>
>>>139       exit 0
>>>
>>>140     fi
>>>
>>>141     ;;
>>>
>>>
>>>
>>>
>>>
>>>This may corrupt the snapshot and transaction logs. Also, its not
>>>recommended to use "kill -9".
>>>
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>>is a chance of dataloss.
>>>
>>>
>>>
>>>How about introducing a shutdown hook which will ensure zookeeper is