Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [Shadow Regions / Read Replicas ] External replication disqualified?


Copy link to this message
-
Re: [Shadow Regions / Read Replicas ] External replication disqualified?
On Tue, Dec 3, 2013 at 6:49 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:

> The read replicas doc mentions something a little more intrusive in the "3
> options" section but doesn't seem to disqualify it.
>
>
I don't quite see what you are referring to actually... Can you please
copy-paste a relevant line from the design doc.
> Relatedly just as another strawman, for the "mostly read only" use case and
> "bulk load only" usecases, why not use normal replication against two
> clusters in the same HDFS / datacenter and add a "bulk load replication"
> feature?
>
>
We considered this and the issue is that the resource usage on the HDFS
would be doubled (for the store files) for the two replica case.
> We'd get latency in the seconds (closer to my expected definition of
> eventual consistency)
>
> Jon
>
> On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
>
> >
> >
> > On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar <[EMAIL PROTECTED]>
> wrote:
> >
> >> On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <[EMAIL PROTECTED]>
> >> wrote:>
> >>  >
> >> > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <[EMAIL PROTECTED]>
> wrote:
> >> >
> >> > > Thanks Jon for bringing this to dev@.
> >> > >
> >> > >
> >> > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <[EMAIL PROTECTED]>
> >> > wrote:
> >> > >
> >> > > > Fundamentally, I'd prefer focusing on making HBase "HBasier"
> >> instead of
> >> > > > tackling a feature that other systems architecturally can do
> better
> >> > > > (inconsistent reads).   I consider consistent reads/writes being
> >> one of
> >> > > > HBase's defining features. That said, I think read replicas makes
> >> sense
> >> > > and
> >> > > > is a nice feature to have.
> >> > > >
> >> > >
> >> > > Our design proposal has a specific use case goal, and hopefully we
> can
> >> > > demonstrate the
> >> > > benefits of having this in HBase so that even more pieces can be
> >> built on
> >> > > top of this. Plus I imagine this will
> >> > > be a widely used feature for read-only tables or bulk loaded tables.
> >> We
> >> > are
> >> > > not
> >> > > proposing of reworking strong consistency semantics or major
> >> > architectural
> >> > > changes. I think by
> >> > > having the tables to be defined with replication count, and the
> >> proposed
> >> > > client API changes (Consistency definition)
> >> > > plugs well into the HBase model rather well.
> >> > >
> >> > >
> >> > I do agree think that without any recent updating mechanism, we are
> >> > limiting this usefulness of this feature to essentially *only* the
> >> > read-only or bulk load only tables.  Recency if there were any
> >> > edits/updates would be severely lagging (by default potentially an
> hour)
> >> > especially in cases where there are only a few edits to a primarily
> bulk
> >> > loaded table.  This limitation is not mentioned in the tradeoffs or
> >> > requirements (or a non-requirements section) definitely should be
> listed
> >> > there.
> >> >
> >>
> >> Obviously the amount of lag you would observe depends on whether you are
> >> using
> >> "Region snapshots", "WAL-Tailing" or "Async wal replication". I think
> >> there
> >> are still
> >> use cases where you can live with >1 hour old stale reads, so that
> "Region
> >> snapshots"
> >> is not *just* for read-only tables. I'll add these to the tradeoff's
> >> section.
> >>
> >
> > Thanks for adding it there -- I really think it is a big headline caveat
> > on my expectation of "eventual consistency".  Other systems out there
> that
> > give you eventually consistency on the millisecond level for most cases,
> > while this initial implementation would has eventual mean 10's of minutes
> > or even handfuls of minutes behind (with the snapshots flush mechanism)!
> >
> > There are a handful of other things in the phase one part of the
> > implementation section that limit the usefulness of the feature to a
> > certain kind of constrained hbase user.  I'll start another thread for
> > those.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB