Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> [Shadow Regions / Read Replicas ]


+
Jonathan Hsieh 2013-12-03, 05:54
+
Jonathan Hsieh 2013-12-03, 06:01
+
Enis Söztutar 2013-12-03, 19:07
+
Jonathan Hsieh 2013-12-03, 19:51
+
Vladimir Rodionov 2013-12-03, 20:31
+
Devaraj Das 2013-12-03, 22:11
+
Enis Söztutar 2013-12-03, 22:18
+
Vladimir Rodionov 2013-12-03, 22:48
+
Jonathan Hsieh 2013-12-04, 02:18
+
谢良 2013-12-07, 13:39
+
Enis Söztutar 2013-12-09, 21:24
+
谢良 2013-12-13, 05:47
+
Enis Söztutar 2013-12-03, 22:04
+
Jonathan Hsieh 2013-12-04, 02:47
+
Jimmy Xiang 2013-12-04, 03:59
+
Jimmy Xiang 2013-12-04, 04:06
+
Devaraj Das 2013-12-04, 06:20
Copy link to this message
-
Re: [Shadow Regions / Read Replicas ]
I am concerned about reading stale data. I understand some people may want
this feature. One of the reason is about the region availability. If we
make sure those regions are always available, we don't have to compromise,
right?  How about we support something like region pipeline? For each
important region, we assign it to two/three region servers and make sure
all writes are on all three region instances, and just one of them persists
data to hlog, or each region instance has its own local hlog (on local fs,
not hdfs). Is this too complex to consider, or write overhead is too high?
On Tue, Dec 3, 2013 at 10:20 PM, Devaraj Das <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
>
> > On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <[EMAIL PROTECTED]>
> wrote:
> >
> > > On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> > wrote:>
> > >  >
> > > > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Thanks Jon for bringing this to dev@.
> > > > >
> > > > >
> > > > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <[EMAIL PROTECTED]>
> > > > wrote:
> > > > >
> > > > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
> > instead
> > > of
> > > > > > tackling a feature that other systems architecturally can do
> better
> > > > > > (inconsistent reads).   I consider consistent reads/writes being
> > one
> > > of
> > > > > > HBase's defining features. That said, I think read replicas makes
> > > sense
> > > > > and
> > > > > > is a nice feature to have.
> > > > > >
> > > > >
> > > > > Our design proposal has a specific use case goal, and hopefully we
> > can
> > > > > demonstrate the
> > > > > benefits of having this in HBase so that even more pieces can be
> > built
> > > on
> > > > > top of this. Plus I imagine this will
> > > > > be a widely used feature for read-only tables or bulk loaded
> tables.
> > We
> > > > are
> > > > > not
> > > > > proposing of reworking strong consistency semantics or major
> > > > architectural
> > > > > changes. I think by
> > > > > having the tables to be defined with replication count, and the
> > > proposed
> > > > > client API changes (Consistency definition)
> > > > > plugs well into the HBase model rather well.
> > > > >
> > > > >
> > > > I do agree think that without any recent updating mechanism, we are
> > > > limiting this usefulness of this feature to essentially *only* the
> > > > read-only or bulk load only tables.  Recency if there were any
> > > > edits/updates would be severely lagging (by default potentially an
> > hour)
> > > > especially in cases where there are only a few edits to a primarily
> > bulk
> > > > loaded table.  This limitation is not mentioned in the tradeoffs or
> > > > requirements (or a non-requirements section) definitely should be
> > listed
> > > > there.
> > > >
> > >
> > > Obviously the amount of lag you would observe depends on whether you
> are
> > > using
> > > "Region snapshots", "WAL-Tailing" or "Async wal replication". I think
> > there
> > > are still
> > > use cases where you can live with >1 hour old stale reads, so that
> > "Region
> > > snapshots"
> > > is not *just* for read-only tables. I'll add these to the tradeoff's
> > > section.
> > >
> >
> > Thanks for adding it there -- I really think it is a big headline caveat
> on
> > my expectation of "eventual consistency".  Other systems out there that
> > give you eventually consistency on the millisecond level for most cases,
> > while this initial implementation would has eventual mean 10's of minutes
> > or even handfuls of minutes behind (with the snapshots flush mechanism)!
> >
> >
> But that's just how the implementation is broken up currently. When WAL
> tailing is implemented, we will be close, maybe, in the order of seconds
> behind.
>
>
> > There are a handful of other things in the phase one part of the
> > implementation section that limit the usefulness of the feature to a
+
Enis Söztutar 2013-12-04, 22:23
+
Enis Söztutar 2013-12-04, 22:00
+
Stack 2013-12-04, 22:47
+
Enis Söztutar 2013-12-04, 23:46
+
Stack 2013-12-04, 23:56
+
Jonathan Hsieh 2013-12-13, 01:38
+
Devaraj Das 2013-12-03, 19:26