Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Coprocessor Increments


Copy link to this message
-
Re: Coprocessor Increments
If you allow coprocessors to connect to any other server the connections
between the HBase nodes can be represented as a directed graph. This means
that deadlock is possible (when N1 is blocked waiting for N2, and N2 is
blocked waiting for N1). In order to guarantee that no deadlock will occur,
you have to be able to represent the connections as a directed acyclic
graph. And to do this, you cannot have nodes connecting directly to each
other.

You could also achieve this by carefully assigning your regions. If the
primary table (PT) is located exclusively on nodes N1..N3, and the
secondary/index table is located exclusively on nodes N4..N6, (assuming
updates to the secondary/index table never change the PT) the connections
would remain a DAG and deadlock would be impossible.

You're right that by putting the business logic into the coprocessor you
gain the ability to easily allow any group to access your cluster. But that
access isn't free. To use a SQL analogy: large organizations always protect
their SQL servers with a DBA. They do this because the potential downsides
of allowing unsupervised and unstructured access are too great.

YMMV

--Tom

On Mon, Oct 14, 2013 at 8:50 AM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Anil,
>
> I wasn't suggesting that you can't do what you're doing, but you end up
> running in to the risks which coprocessors are supposed to remove. The
> standard YMMV always applies.
>
> You have a cluster… another team in your company wants to use the cluster.
> So instead of the cluster being a single resource for your app/team, it now
> becomes a shared resource. So now you have people accessing HBase for
> multiple apps.
>
> You could then run multiple HBase HMasters with different locations for
> files, however… this can get messy.
> HOYA seems to suggest this as the future.  If so, then you have to wonder
> about data locality.
>
> Having your app update the primary table and then the secondary index is
> always a good fallback, however you need to ensure that you understand the
> risks.
>
> With respect to secondary indexes… if you decouple the writes… you can get
> better throughput. Note that the code becomes a bit more complex because
> you're going to have to introduce a couple of different things.  But thats
> something for a different discussion…
>
>
> On Oct 13, 2013, at 10:15 AM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Inline.
> >
> > On Sun, Oct 13, 2013 at 6:02 AM, Michael Segel <
> [EMAIL PROTECTED]>wrote:
> >
> >> Ok…
> >>
> >> Sure you can have your app update the secondary index table.
> >> The only issue with that is if someone updates the base table outside of
> >> your app,
> >> they may or may not increment the secondary index.
> >>
> > Anil: We dont allow people to write data into HBase from their own HBase
> > client. We control the writes into HBase. So, we dont have the problem of
> > secondary index not getting written.
> > For example, If you expose a restful web service you can easily control
> the
> > writes to HBase. Even, if user requests to write one row in "main table",
> > you application can have the logic to writing in "Secondary index"
> tables.
> > In this way, it is transparent to users also. You can add/remove seconday
> > indexes as you want.
> >
> >> Note that your secondary index doesn't have to be an inverted table, but
> >> could be SOLR, LUCENE or something else.
> >>
> > Anil:As of now, we are happy with Inverted tables as they fit to our use
> > case.
> >
> >>
> >> So you really want to secondary indexes on the server.
> >>
> >> There are a couple of things that could improve the performance,
> although
> >> the write to the secondary index would most likely lag under heavy load.
> >>
> >>
> >> On Oct 12, 2013, at 11:27 PM, anil gupta <[EMAIL PROTECTED]> wrote:
> >>
> >>> John,
> >>>
> >>> My 2 cents:
> >>> I tried implementing Secondary Index by using Region Observers on Put.
> It
> >>> works well under low load. But, under heavy load the RO could not keep
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB