Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - consistency, availability and partition pattern of HBase


Copy link to this message
-
Re: consistency, availability and partition pattern of HBase
Amandeep Khurana 2012-08-09, 05:34
Firstly, I recommend you read the GFS and Bigtable papers. That'll give you
a good understanding of the architecture. Adhoc question on the mailing
list won't.

I'll try to answer some of your questions briefly. Think of HBase as a
database layer over an underlying filesystem (the same way MySQL is over
ext2/3/4 etc). The filesystem for HBase in this case is HDFS. HDFS
replicates data for redundancy and fault tolerance. HBase has region
servers that serve the regions. Regions form tables. Region servers persist
their data on HDFS. Now, every region is served by one and only one region
server. So, HBase is not replicating anything. Replication is handled at
the storage layer. If a region server goes down, all its regions now need
to be served by some other region server. During this period of region
assignment, the clients experience degraded availability if they try to
interact with any of those regions.

Coming back to CAP. HBase chooses to degrade availability in the face of
partitions. "Partition" is a very general term here and does not
necessarily mean network partitions. Any node falling off the HBase cluster
can be considered to be a partition. So, when failures happen, HBase
degrades availability but does not give up consistency. Consistency in this
context is sort of the equivalent of atomicity in ACID. In the context of
HBase, any copy of data that is written to HBase will be visible to all
clients. There is no concept of multiple different versions that the
clients need to reconcile between. When you read, you always get the same
version of the row you are reading. In other words, HBase is strongly
consistent.

Hope that clears things up a bit.

On Thu, Aug 9, 2012 at 8:02 AM, Lin Ma <[EMAIL PROTECTED]> wrote:

> Thank you Lars.
>
> Is the same data store duplicated copy across region server? If so, if one
> primary server for the region dies, client just need to read from the
> secondary server for the same region. Why there is data is unavailable
> time?
>
> BTW: please feel free to correct me for any wrong knowledge about HBase.
>
> regards,
> Lin
>
> On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > After a write completes the next read (regardless of the location it is
> > issued from) will see the latest value.
> > This is because at any given time exactly RegionServer is responsible for
> > a specific Key
> > (through assignment of key ranges to regions and regions to
> RegionServers).
> >
> >
> > As Mohit said, the trade off is that data is unavailable if a
> RegionServer
> > dies until another RegionServer picks up the regions (and by extension
> the
> > key range)
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Lin Ma <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Cc:
> > Sent: Wednesday, August 8, 2012 8:47 AM
> > Subject: Re: consistency, availability and partition pattern of HBase
> >
> > And consistency is not sacrificed? i.e. all distributed clients' update
> > will results in sequential / real time update? Once update is done by one
> > client, all other client could see results immediately?
> >
> > regards,
> > Lin
> >
> > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia <[EMAIL PROTECTED]
> > >wrote:
> >
> > > I think availability is sacrificed in the sense that if region server
> > > fails clients will have data inaccessible for the time region comes up
> on
> > > some other server, not to confuse with data loss.
> > >
> > > Sent from my iPad
> > >
> > > On Aug 7, 2012, at 11:56 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> > >
> > > > Thank you Wei!
> > > >
> > > > Two more comments,
> > > >
> > > > 1. How about Hadoop's CAP characters do you think about?
> > > > 2. For your comments, if HBase implements "per key sequential
> > > consistency",
> > > > what are the missing characters for consistency? Cross-key update
> > > > sequences? Could you show me an example about what you think are
> > missed?
> > > > thanks.
> > > >
> > > > regards,
> > >