Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> consistency, availability and partition pattern of HBase


Copy link to this message
-
Re: consistency, availability and partition pattern of HBase
HDFS also chooses to degrade availability in the face of partitions.

On Thu, Aug 9, 2012 at 11:08 AM, Lin Ma <[EMAIL PROTECTED]> wrote:

> Amandeep, thanks for your comments, and I will definitely read the paper
> you suggested.
>
> For Hadoop itself, what do you think its CAP features? Which one of the
> CAP is sacrificed?
>
> regards,
> Lin
>
> On Thu, Aug 9, 2012 at 1:34 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:
>
>> Firstly, I recommend you read the GFS and Bigtable papers. That'll give
>> you
>> a good understanding of the architecture. Adhoc question on the mailing
>> list won't.
>>
>> I'll try to answer some of your questions briefly. Think of HBase as a
>> database layer over an underlying filesystem (the same way MySQL is over
>> ext2/3/4 etc). The filesystem for HBase in this case is HDFS. HDFS
>> replicates data for redundancy and fault tolerance. HBase has region
>> servers that serve the regions. Regions form tables. Region servers
>> persist
>> their data on HDFS. Now, every region is served by one and only one region
>> server. So, HBase is not replicating anything. Replication is handled at
>> the storage layer. If a region server goes down, all its regions now need
>> to be served by some other region server. During this period of region
>> assignment, the clients experience degraded availability if they try to
>> interact with any of those regions.
>>
>> Coming back to CAP. HBase chooses to degrade availability in the face of
>> partitions. "Partition" is a very general term here and does not
>> necessarily mean network partitions. Any node falling off the HBase
>> cluster
>> can be considered to be a partition. So, when failures happen, HBase
>> degrades availability but does not give up consistency. Consistency in
>> this
>> context is sort of the equivalent of atomicity in ACID. In the context of
>> HBase, any copy of data that is written to HBase will be visible to all
>> clients. There is no concept of multiple different versions that the
>> clients need to reconcile between. When you read, you always get the same
>> version of the row you are reading. In other words, HBase is strongly
>> consistent.
>>
>> Hope that clears things up a bit.
>>
>> On Thu, Aug 9, 2012 at 8:02 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>>
>> > Thank you Lars.
>> >
>> > Is the same data store duplicated copy across region server? If so, if
>> one
>> > primary server for the region dies, client just need to read from the
>> > secondary server for the same region. Why there is data is unavailable
>> > time?
>> >
>> > BTW: please feel free to correct me for any wrong knowledge about HBase.
>> >
>> > regards,
>> > Lin
>> >
>> > On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > After a write completes the next read (regardless of the location it
>> is
>> > > issued from) will see the latest value.
>> > > This is because at any given time exactly RegionServer is responsible
>> for
>> > > a specific Key
>> > > (through assignment of key ranges to regions and regions to
>> > RegionServers).
>> > >
>> > >
>> > > As Mohit said, the trade off is that data is unavailable if a
>> > RegionServer
>> > > dies until another RegionServer picks up the regions (and by extension
>> > the
>> > > key range)
>> > >
>> > > -- Lars
>> > >
>> > >
>> > > ----- Original Message -----
>> > > From: Lin Ma <[EMAIL PROTECTED]>
>> > > To: [EMAIL PROTECTED]
>> > > Cc:
>> > > Sent: Wednesday, August 8, 2012 8:47 AM
>> > > Subject: Re: consistency, availability and partition pattern of HBase
>> > >
>> > > And consistency is not sacrificed? i.e. all distributed clients'
>> update
>> > > will results in sequential / real time update? Once update is done by
>> one
>> > > client, all other client could see results immediately?
>> > >
>> > > regards,
>> > > Lin
>> > >
>> > > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia <
>> [EMAIL PROTECTED]
>> > > >wrote:
>> > >
>> > > > I think availability is sacrificed in the sense that if region