Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - [PROPOSAL] HBASE-10070 branch


Copy link to this message
-
Re: [PROPOSAL] HBASE-10070 branch
Devaraj Das 2014-01-16, 02:29
On Wed, Jan 15, 2014 at 4:43 PM, Elliott Clark <[EMAIL PROTECTED]> wrote:
> On Wed, Jan 15, 2014 at 3:57 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
>
>> I am afraid, it is not coprocessors or current set of plugins only. We need
>> changes in the
>> RPC, meta, region server, LB and master. Since we cannot easily get hooks
>> into all these in
>> a clean manner, implementing this purely outside would be next to
>> impossible.
>>
>
> I'm pretty unconvinced that this is the correct way forward.  It seems to
> introduce a lot of risk without a lot of gain.  Right now to me the 100%
> correct way forward is through paxos.  That's a lot of work but it has the
> most payoff in the end.  It will allow much faster recovery, much easier
> read sharding, it allows the greatest flexibility on IO.
>

Elliott, if I am not mistaken, we will need the replica management
work for the Paxos case as well. A lot of the work done in HBASE-10070
(to start with, the master/loadbalancer side of the region-replica
management work) would be leveraged if we choose to implement Paxos.

> On the other end of the spectrum is something like MySQL/Postgres read
> slaves (either tables or clusters).  Read slaves built on top of what's
> currently there seem to give all of the benefits of read slaves built into
> the current HBase without all of the risk. Sharding on top of the already
> built datastore is a pretty well known and well understood problem.  There
> are lots of great example of making this scale to pretty insane heights.
>  You lose very little flexibility and incur almost not risk to the
> stability of HBase.

We have gone over this point before. We are trying to address the
issue within a single cluster. We don't want to create more storage
overhead if we can help it (which we would have if we did
intra-cluster replication).
Again the default behavior of single replica per region, etc. is kept
intact. This should be true even from the stability point of view.

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.