Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> consistency, availability and partition pattern of HBase


+
Lin Ma 2012-08-08, 01:28
+
Wei Tan 2012-08-08, 04:18
+
J Mohamed Zahoor 2012-08-08, 04:56
+
Lin Ma 2012-08-08, 06:56
+
Mohit Anchlia 2012-08-08, 15:17
+
Lin Ma 2012-08-08, 15:47
+
lars hofhansl 2012-08-09, 01:31
+
Lin Ma 2012-08-09, 02:32
+
Bryan Beaudreault 2012-08-09, 03:09
+
lars hofhansl 2012-08-09, 04:21
+
Lin Ma 2012-08-09, 05:34
+
Amandeep Khurana 2012-08-09, 06:04
+
Lin Ma 2012-08-09, 08:18
Copy link to this message
-
Re: consistency, availability and partition pattern of HBase
Please read the papers. You'll understand the architecture better that way.

On Aug 9, 2012, at 1:48 PM, Lin Ma <[EMAIL PROTECTED]> wrote:

Thank you Amandeep,

So I can simply understand in this way (logically), there do exist multiple
region servers for the same region, but they are working in active-passive
mode, when at one time only one active server is active? Correct?

regards,
Lin

On Thu, Aug 9, 2012 at 2:04 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote:

> Correct. You are limited to the throughput of a single region server while
> interacting with a particular region. This throughput limitation is
> typically handled by designing your keys such that your data is distributed
> well across the cluster.
> Having multiple region servers serve a single region gets you into the land
> of maintaining consistency across copies, which is challenging. It might be
> doable but that's not the design choice Bigtable (and hence HBase) made
> initially.
>
> On Thu, Aug 9, 2012 at 11:04 AM, Lin Ma <[EMAIL PROTECTED]> wrote:
>
> > Thanks
> >
> > "only a single RegionServer ever hosts a region at once" -- I know HDFS
> > have multiple copies for the same file. Is region server works in
> > active-passive way, i.e. even if there are multiple copies, only one
> region
> > server could serve? If so, will it be bottleneck, supposing the traffic
> to
> > that region is too high?
> >
> > regards,
> > Lin
> >
> > On Thu, Aug 9, 2012 at 11:09 AM, Bryan Beaudreault <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > Actual data backing hbase is replicated, but that is handled by HDFS.
> >  Yes,
> > > if you lose an hdfs datanode, clients (in this case the client is
> hbase)
> > > move to the next node in the pipeline.
> > >
> > > However, only a single RegionServer ever hosts a region at once.  If
> the
> > > RegionServer dies, there is a period where the master must notice the
> > > regions are unhosted and move them to other regionservers.  During that
> > > period, data is inaccessible or modifiable.
> > >
> > > On Wed, Aug 8, 2012 at 10:32 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> > >
> > > > Thank you Lars.
> > > >
> > > > Is the same data store duplicated copy across region server? If so,
> if
> > > one
> > > > primary server for the region dies, client just need to read from the
> > > > secondary server for the same region. Why there is data is
> unavailable
> > > > time?
> > > >
> > > > BTW: please feel free to correct me for any wrong knowledge about
> > HBase.
> > > >
> > > > regards,
> > > > Lin
> > > >
> > > > On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > After a write completes the next read (regardless of the location
> it
> > is
> > > > > issued from) will see the latest value.
> > > > > This is because at any given time exactly RegionServer is
> responsible
> > > for
> > > > > a specific Key
> > > > > (through assignment of key ranges to regions and regions to
> > > > RegionServers).
> > > > >
> > > > >
> > > > > As Mohit said, the trade off is that data is unavailable if a
> > > > RegionServer
> > > > > dies until another RegionServer picks up the regions (and by
> > extension
> > > > the
> > > > > key range)
> > > > >
> > > > > -- Lars
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > From: Lin Ma <[EMAIL PROTECTED]>
> > > > > To: [EMAIL PROTECTED]
> > > > > Cc:
> > > > > Sent: Wednesday, August 8, 2012 8:47 AM
> > > > > Subject: Re: consistency, availability and partition pattern of
> HBase
> > > > >
> > > > > And consistency is not sacrificed? i.e. all distributed clients'
> > update
> > > > > will results in sequential / real time update? Once update is done
> by
> > > one
> > > > > client, all other client could see results immediately?
> > > > >
> > > > > regards,
> > > > > Lin
> > > > >
> > > > > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia <
> > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > I think availability is sacrificed in the sense that if region
+
Amandeep Khurana 2012-08-09, 05:34
+
Lin Ma 2012-08-09, 05:38
+
Amandeep Khurana 2012-08-09, 05:41
+
Lin Ma 2012-08-09, 08:15
+
Mohit Anchlia 2012-08-09, 05:23
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB