Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> consistency, availability and partition pattern of HBase


Copy link to this message
-
Re: consistency, availability and partition pattern of HBase
HDFS also chooses to degrade availability in the face of partitions.

On Thu, Aug 9, 2012 at 11:08 AM, Lin Ma <[EMAIL PROTECTED]> wrote:

> Amandeep, thanks for your comments, and I will definitely read the paper
> you suggested.
>
> For Hadoop itself, what do you think its CAP features? Which one of the
> CAP is sacrificed?
>
> regards,
> Lin
>
> On Thu, Aug 9, 2012 at 1:34 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
>
>> Firstly, I recommend you read the GFS and Bigtable papers. That'll give
>> you
>> a good understanding of the architecture. Adhoc question on the mailing
>> list won't.
>>
>> I'll try to answer some of your questions briefly. Think of HBase as a
>> database layer over an underlying filesystem (the same way MySQL is over
>> ext2/3/4 etc). The filesystem for HBase in this case is HDFS. HDFS
>> replicates data for redundancy and fault tolerance. HBase has region
>> servers that serve the regions. Regions form tables. Region servers
>> persist
>> their data on HDFS. Now, every region is served by one and only one region
>> server. So, HBase is not replicating anything. Replication is handled at
>> the storage layer. If a region server goes down, all its regions now need
>> to be served by some other region server. During this period of region
>> assignment, the clients experience degraded availability if they try to
>> interact with any of those regions.
>>
>> Coming back to CAP. HBase chooses to degrade availability in the face of
>> partitions. "Partition" is a very general term here and does not
>> necessarily mean network partitions. Any node falling off the HBase
>> cluster
>> can be considered to be a partition. So, when failures happen, HBase
>> degrades availability but does not give up consistency. Consistency in
>> this
>> context is sort of the equivalent of atomicity in ACID. In the context of
>> HBase, any copy of data that is written to HBase will be visible to all
>> clients. There is no concept of multiple different versions that the
>> clients need to reconcile between. When you read, you always get the same
>> version of the row you are reading. In other words, HBase is strongly
>> consistent.
>>
>> Hope that clears things up a bit.
>>
>> On Thu, Aug 9, 2012 at 8:02 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>> > Thank you Lars.
>> >
>> > Is the same data store duplicated copy across region server? If so, if
>> one
>> > primary server for the region dies, client just need to read from the
>> > secondary server for the same region. Why there is data is unavailable
>> > time?
>> >
>> > BTW: please feel free to correct me for any wrong knowledge about HBase.
>> >
>> > regards,
>> > Lin
>> >
>> > On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > After a write completes the next read (regardless of the location it
>> is
>> > > issued from) will see the latest value.
>> > > This is because at any given time exactly RegionServer is responsible
>> for
>> > > a specific Key
>> > > (through assignment of key ranges to regions and regions to
>> > RegionServers).
>> > >
>> > >
>> > > As Mohit said, the trade off is that data is unavailable if a
>> > RegionServer
>> > > dies until another RegionServer picks up the regions (and by extension
>> > the
>> > > key range)
>> > >
>> > > -- Lars
>> > >
>> > >
>> > > ----- Original Message -----
>> > > From: Lin Ma <[EMAIL PROTECTED]>
>> > > To: [EMAIL PROTECTED]
>> > > Cc:
>> > > Sent: Wednesday, August 8, 2012 8:47 AM
>> > > Subject: Re: consistency, availability and partition pattern of HBase
>> > >
>> > > And consistency is not sacrificed? i.e. all distributed clients'
>> update
>> > > will results in sequential / real time update? Once update is done by
>> one
>> > > client, all other client could see results immediately?
>> > >
>> > > regards,
>> > > Lin
>> > >
>> > > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia <
>> [EMAIL PROTECTED]
>> > > >wrote:
>> > >
>> > > > I think availability is sacrificed in the sense that if region
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB