Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> [Shadow Regions / Read Replicas ]


Copy link to this message
-
Re: [Shadow Regions / Read Replicas ]
On Tue, Dec 3, 2013 at 12:31 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> The downside:
>
> - Double/Triple memstore usage
> - Increased block cache usage (effectively, block cache will have 50%
> capacity may be less)
>
> These downsides are pretty serious ones. This will result:
>
> 1. in decreased overall performance due to decreased efficient block cache
> size
>  2. In more frequent memstore flushes - this will affect compaction and
> write tput.
>
>
The thing is that this is configurable on a per table basis. Depending on
the hardware characteristics one may choose to not have more than one
replica per region.. Certain classes of applications + cluster combination
can still benefit from this.

> I do not believe that  HBase 'large' MTTR does not allow to meet 99% SLA.
> of 10-20ms unless your RSs go down 2-3 times a day for several minutes each
> time. You have to analyze first why are you having so frequent failures,
> than fix the root source of the problem. Its possible to reduce 'detection'
> phase in MTTR process to couple seconds either by using external beacon
> process (as I suggested already) or by rewriting some code inside HBase and
> NameNode to move all data out from Java heap to off-heap and reducing
> GC-induced timeouts from 30 sec to 1-2 sec max. Its tough, but doable. The
> result: you will decrease MTTR by 50% at least w/o sacrificing the overall
> cluster performance.
>
> I think, its RS and NN large heaps   and frequent s-t-w GC  activities
> prevents meeting strict SLA - not occasional server failures.
>
>
>
Possibly. Better MTTR and handling of GC issues will continue - no doubt.
But still there is that window of time when certain regions are
unavailable. We want to address that for applications that can tolerate
eventual consistency.
>
> On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
>
> > To keep the discussion focused on the design goals, I'm going start
> > referring to enis and deveraj's eventually consistent read replicas as
> the
> > *read replica* design, and consistent fast read recovery mechanism based
> on
> > shadowing/tailing the wals as *shadow regions* or *shadow memstores*.
>  Can
> > we agree on nomenclature?
> >
> >
> > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
> >
> > > Thanks Jon for bringing this to dev@.
> > >
> > >
> > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Fundamentally, I'd prefer focusing on making HBase "HBasier" instead
> of
> > > > tackling a feature that other systems architecturally can do better
> > > > (inconsistent reads).   I consider consistent reads/writes being one
> of
> > > > HBase's defining features. That said, I think read replicas makes
> sense
> > > and
> > > > is a nice feature to have.
> > > >
> > >
> > > Our design proposal has a specific use case goal, and hopefully we can
> > > demonstrate the
> > > benefits of having this in HBase so that even more pieces can be built
> on
> > > top of this. Plus I imagine this will
> > > be a widely used feature for read-only tables or bulk loaded tables. We
> > are
> > > not
> > > proposing of reworking strong consistency semantics or major
> > architectural
> > > changes. I think by
> > > having the tables to be defined with replication count, and the
> proposed
> > > client API changes (Consistency definition)
> > > plugs well into the HBase model rather well.
> > >
> > >
> > I do agree think that without any recent updating mechanism, we are
> > limiting this usefulness of this feature to essentially *only* the
> > read-only or bulk load only tables.  Recency if there were any
> > edits/updates would be severely lagging (by default potentially an hour)
> > especially in cases where there are only a few edits to a primarily bulk
> > loaded table.  This limitation is not mentioned in the tradeoffs or
> > requirements (or a non-requirements section) definitely should be listed
> > there.

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.