André Oriani 2011-06-11, 02:36
-Re: Extracting Zab from Zookeeper
Andrew Purtell 2011-02-03, 18:52
> I don't understand why you have to ship the log to the read only replicas. Aren't you storing the log on HDFS currently? Can't they read from HDFS directly?
Possibly the replicas can "tail" the WAL of the master, was using the term log shipping in the abstract. However I'm not an HDFS expert so unsure if we could read the last (partial) block in the WAL. Newly written data exists only in memory so the WAL would be the only option for transmitting this data until flush without some sort of direct replication.
> I wonder why you are choosing 3 for the size of a clique and not letting it be a free parameter.
It would but 3 seems a reasonable default. (?)
> Are you choosing 3 to avoid the replication overhead?
> #1 is relatively simple but trades away the consistency
> I don't see where you could have inconsistencies here. Would you mind elaborating a bit further?
At any given instant queries to a replica may not return the same result as the (write) master for data in memstore and (possibly) in the last block of the WAL.
Problems worthy of attack prove their worth by hitting back.
- Piet Hein (via Tom White)
--- On Wed, 2/2/11, Flavio Junqueira <[EMAIL PROTECTED]> wrote:
From: Flavio Junqueira <[EMAIL PROTECTED]>
Subject: Re: Extracting Zab from Zookeeper
Date: Wednesday, February 2, 2011, 2:14 AM
Hi Andrew, Interesting use case, thanks for sharing. I'm curious about a few things:
On Feb 1, 2011, at 5:38 PM, Andrew Purtell wrote:
Two ideas actually:
1) Do pretty straightforward log shipping from region master to read only replicas.
I don't understand why you have to ship the log to the read only replicas. Aren't you storing the log on HDFS currently? Can't they read from HDFS directly?
2) Divide the cluster into quorum 3-cliques. Extract ZAB and use it to maintain consensus on writes from region master to two read only replicas. Run the consensus protocol in parallel with HDFS hflush to the write ahead log. Needs a lot of work filling in the detail, obviously, but that's the general notion.
I wonder why you are choosing 3 for the size of a clique and not letting it be a free parameter. I would think that this a decision of the user. Are you choosing 3 to avoid the replication overhead?
#1 is relatively simple but trades away the consistency for which HBase is indicated for higher availability (for reads) when regions are in transition.
I don't see where you could have inconsistencies here. Would you mind elaborating a bit further?
#2 is not simple at all but may let maintain replicas that are fully consistent at all times with the region master, not lower region master write performance unacceptably, and also gain the higher availability (for reads) when regions are in transition.
Agreed, it will be tricky, especially because we would have to extract Zab first.
direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300 fax (408) 349 3301
Benjamin Reed 2011-01-19, 19:32
André Oriani 2011-01-20, 16:35
Patrick Hunt 2011-01-20, 17:45
André Oriani 2011-01-22, 07:43
Flavio Junqueira 2011-01-22, 15:26
André Oriani 2011-01-28, 07:24
Ted Dunning 2011-01-28, 15:03
Benjamin Reed 2011-01-28, 18:27
Andrew Purtell 2011-01-31, 00:56
Mahadev Konar 2011-01-31, 03:21
Andrew Purtell 2011-02-01, 16:38
Flavio Junqueira 2011-02-02, 10:14