-Re: Question about WAL writes after region server "soft failures"
Jimmy Xiang 2012-09-07, 19:32
When the dead region server comes back, it won't be able to write data
to the WAL any more.
As the first thing of log splitting, the WAL folder for the dead
region server is renamed. When
the dead region server tries to write to the WAL, it will find the
file is not there any more.
On Fri, Sep 7, 2012 at 12:19 PM, Nick Puz <[EMAIL PROTECTED]> wrote:
> I'm new to HBase and HDFS and have a question about what happens when
> failure is detected and a new region server takes over a region. If the old
> region server hasn't really failed and "comes back" will it still accept
> Here's a specific sequence of events:
> 1) region R is currently being served by region server RS1.
> 2) RS1 hangs for some reason (long GC, network hiccup, etc)
> 3) the region master gets notified that RS1 is down so it splits logs and
> reassigns. Looking at the code splitting logs renames the log directory so
> if RS1 tries to create a new log file it will fail.
> 4) region server RS2 is assigned the region, replays the log, and all is
> 5) RS1 comes back to life.
> After 5 happens:
> - if it had inflight requests will it write the to the WAL and eventually
> flush the memtables?
> - if it gets new requests will it service them as long as it is still
> appending to the same block in the WAL file?
> One way to prevent the clients getting acks would be to set the client
> timeout to be less than the zookeeper session timeout
> (zookeeper.session.timeout) which seems like a logical thing to do.
> But even if the timeouts were such the client got a timeout are there
> scenarios when the edits would be readable by other clients? (say if that
> log file was rescanned)