Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> Does abrupt kill corrupts the datadir?


+
Laxman 2011-07-13, 06:16
+
Mahadev Konar 2011-07-13, 06:31
+
Laxman 2011-07-13, 07:05
+
Laxman 2011-07-26, 09:02
+
Patrick Hunt 2011-07-27, 17:25
Copy link to this message
-
Re: FW: Does abrupt kill corrupts the datadir?
i agree with pat. if we use sigterm in the script, we would want to
put a timeout in to escalate to a -9 which makes the script a bit more
complicated without reason since we don't have any exit hooks that we
want to run. zookeeper is designed to recover well from hard failures,
much worse than a kill -9. i don't think we want to change that.

ben

On Wed, Jul 27, 2011 at 10:25 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> ZK has been built around the "fail fast" approach. In order to
> maintain high availability we want to ensure that restarting a server
> will result in it attempting to rejoin the quorum. IMO we would not
> want to change this (kill -9).
>
> Patrick
>
> On Tue, Jul 26, 2011 at 2:02 AM, Laxman <[EMAIL PROTECTED]> wrote:
>> Hi Everyone,
>>
>> Any thoughts?
>> Do we need consider changing abrupt shutdown to
>>
>> Implementations in some other hadoop eco system projects for your reference.
>> Hadoop - kill [SIGTERM]
>> HBase - kill [SIGTERM] and then "kill -9" [SIGKILL] if process hung
>> ZooKeeper - "kill -9" [SIGKILL]
>>
>>
>> -----Original Message-----
>> From: Laxman [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, July 13, 2011 12:36 PM
>> To: '[EMAIL PROTECTED]'
>> Subject: RE: Does abrupt kill corrupts the datadir?
>>
>> Hi Mahadev,
>>
>> Shutdown hook is just a quick thought. Another approach can be just give a
>> kill [SIGTERM] call which can be interpreted by process.
>>
>> First look at the "kill -9" triggered the following scenario.
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>>is a chance of dataloss.
>>
>> How does zookeeper can deal with this scenario gracefully?
>>
>> Also, I feel we should give a chance to application to shutdown gracefully
>> before abrupt shutdown.
>>
>> http://en.wikipedia.org/wiki/SIGKILL
>>
>> Because SIGKILL gives the process no opportunity to do cleanup operations on
>> terminating, in most system shutdown procedures an attempt is first made to
>> terminate processes using SIGTERM, before resorting to SIGKILL.
>>
>> http://rackerhacker.com/2010/03/18/sigterm-vs-sigkill/
>>
>> The application can determine what it wants to do once a SIGTERM is
>> received. While most applications will clean up their resources and stop,
>> some may not. An application may be configured to do something completely
>> different when a SIGTERM is received. Also, if the application is in a bad
>> state, such as waiting for disk I/O, it may not be able to act on the signal
>> that was sent.
>>
>> Most system administrators will usually resort to the more abrupt signal
>> when an application doesn't respond to a SIGTERM.
>>
>> -----Original Message-----
>> From: Mahadev Konar [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, July 13, 2011 12:02 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: Does abrupt kill corrupts the datadir?
>>
>> Hi Laxman,
>>  The servers takes care of all the issues with data integrity, so a kill
>> -9 is OK. Shutdown hooks are tricky. Also, the best way to make sure
>> everything works reliably is use kill -9 :).
>>
>> Thanks
>> mahadev
>>
>> On 7/12/11 11:16 PM, "Laxman" <[EMAIL PROTECTED]> wrote:
>>
>>>When we stop zookeeper through zkServer.sh stop, we are aborting the
>>>zookeeper process using "kill -9".
>>>
>>>
>>>
>>>129 stop)
>>>
>>>130     echo -n "Stopping zookeeper ... "
>>>
>>>131     if [ ! -f "$ZOOPIDFILE" ]
>>>
>>>132     then
>>>
>>>133       echo "error: could not find file $ZOOPIDFILE"
>>>
>>>134       exit 1
>>>
>>>135     else
>>>
>>>136       $KILL -9 $(cat "$ZOOPIDFILE")
>>>
>>>137       rm "$ZOOPIDFILE"
>>>
>>>138       echo STOPPED
>>>
>>>139       exit 0
>>>
>>>140     fi
>>>
>>>141     ;;
>>>
>>>
>>>
>>>
>>>
>>>This may corrupt the snapshot and transaction logs. Also, its not
>>>recommended to use "kill -9".
>>>
>>>In worst case, if latest snaps in all zookeeper nodes gets corrupted there
>>>is a chance of dataloss.
>>>
>>>
>>>
>>>How about introducing a shutdown hook which will ensure zookeeper is
+
Laxman 2011-07-28, 07:50
+
Benjamin Reed 2011-07-28, 16:05
+
Andrei Savu 2011-07-28, 23:14
+
Patrick Hunt 2011-08-01, 18:37
+
Laxman 2011-07-29, 09:26
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB