Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Suspension


Systems I've used that include automatic restart usually have a limit of
restarting 3-4 times in a row, before giving up. It's nice if you can have
a time out on that counter so you retain the auto-restart capability if you
need to suspend a few days from now.

I've also worked on a system where process restarts were the way we handled
failures. ZooKeeper state can be tricky to recover if you've been down for
long enough for your session to expire. I found it easier to just kill the
process and go through the full "boot-up" logic. In that system, we used
the shell scripts launching the JVMs handle the restart with the restart
policy being dictated by exit code.

-Joey

On Wed, Feb 15, 2012 at 11:16 AM, John Vines <[EMAIL PROTECTED]> wrote:

> There are too many cases where a node legitimately died and we do not want
> it constantly coming back and bogging things down. How do you design it to
> restart the accidentally deaths but not the deserves it deaths?
> On Feb 15, 2012 11:11 AM, "Adam Fuchs" <[EMAIL PROTECTED]> wrote:
>
>> This isn't really just a laptop problem. We also see hiccups in clusters
>> (admins accidentally the whole network, etc.) that we would want to
>> automatically recover from. I think having self-restarting processes could
>> be generally useful.
>>
>> I think that an option of not using zookeeper timeouts might lead to
>> abuse, and could be very bad for stability under rare failure modes. We
>> make a lot of assumptions throughout the code about these timeouts, and we
>> would have to reconsider a large part of that model.
>>
>> Adam
>>
>>
>> On Wed, Feb 15, 2012 at 10:56 AM, Billie J Rinaldi <
>> [EMAIL PROTECTED]> wrote:
>>
>>> On Wednesday, February 15, 2012 10:38:41 AM, "Aaron Cordova" <
>>> [EMAIL PROTECTED]> wrote:
>>> > Such an option would have to be very conspicuous so that users don't
>>> > accidentally enable it and then wonder why bad tablet servers aren't
>>> > removed automatically from the cluster.
>>>
>>> We could call it laptop.mode.
>>>
>>> Billie
>>>
>>
>>
--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB