Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> adding constraints


Copy link to this message
-
Re: adding constraints
Comments inline.

On Tue, Oct 18, 2011 at 12:43 AM, Gary Helmling <[EMAIL PROTECTED]> wrote:

> >
> > There is an example of how to do Constraints as a jar with CPs already
> > attached to the ticket, and its pretty simple. However, the ticket goes
> into
> > the plusses and minuses for a top-level or just basic CP based
> > implementation.
> >
> > For me, the best reason for top level is top make HBase easy to use and
> have
> > certain built-in features.
>
> Hmm, I wasn't really reading the two implementation options for
> constraints as a choice between a "built-in" feature and CP based.
>

Either way it would be CP based, but the 'built-in' would just have some
'nice' ways of adding things. In short, its a question of adding a method to
the HTD for addConstraint() to add a bunch of classes to be run by the
'constraint CP'.

I could theoretically see a situation where people would want to have the
constraint extend from some other class (due to legacy code), meaning
extending an existing CP is a little more of a pain.

So, yeah, it still looks the #1 (below), but its easier to use. And if you
don't want to enable constraints, don't add the constraint jar as the CP
list - no runtime slowdown and its still a bit similar to how security is
done.
> I'm reading it as a choice between:
> 1) a bundled CP implementation (which you still have to _enable_) that
> does constraint checking loading user classes that implement a simple
> interface (Constraint or Predicate<Put> or whatever)
> 2) an abstract CP example class that you have to extend with your own
> implementation logic, which, if you want to do it right, you'll still
> wind up with something resembling #1 anyway
>
> FYI, I see option #1 as fairly analogous to the bundled aggregation
> client that Lars mentioned.
>
> If you want this as real top-level functionality built directly in to,
> say, the HRegion code paths for puts, the question is why should we
> add the complexity directly when we have CPs?
>

I feel like having the addConstraint() for a table is actually _less_
complexity. Not necessarily from the overall system perspective certainly
(you have to do a little abstraction and a couple more methods), but its not
that much more as it all centered around the HTD.
>
> > Yeah, we can do security, but you have to include
> > the jars make sure it works, etc. As opposed to _certain_ systems where
> > security is built in. Similar arguments can be made for things like
> > constraints - its just _easier_ to have it built in, and let people use
> them
> > (or not) as they choose.
> >
>
> We have a security implementation up for review that provides
> meaningful security.  Yes, it has to be enabled to be used and the
> process of configuring it could be much simpler.  Security is always a
> matter of trade-offs.  You can argue about about whether or not we've
> made the right ones.  But the current approach for security was
> arrived at as a result of extensive discussions with the entire
> community about the right approach, where many concerns were raised
> about paying any overhead for security when it was not being used.  As
> a result, all security components were built in a loadable fashion,
> with the trade-off of some extra configuration complexity.
>
> Yes, Accumulo has "security" always enabled.  But this is still not an
> apples-to-apples comparison.  HBase security relies on Kerberos to
> provide a trusted third part for strong authentication while never
> sending the password over the wire.  Accumulo sends username and
> password in plain text on the rpc connections.  As a result HBase
> relies on external systems for managing credentials, while Accumulo
> embeds its own user database, with the usernames and hashed passwords
> stored as globally readable znodes in zookeeper.  You could say that
> reliance on an external system makes the HBase setup more complex, but
> that's a narrow view.  While managing an internal user database does
> keep things self contained, it also forces you to create usernames and

Sorry for bringing security up flippantly - clearly you guys have thought
about that a lot and i wasn't trying to imply that you hadn't. Yeah,
Accumulo has a different model (and clearly has its flaws) and is running in
an, arguably, very different environment (with different requirements) than
most people running hbase. I think it makes sense to not have security
impact performance by making it loadable. However, loading should be easy.

What I'm concerned about is the configuration complexity - there are a ton
of them and adding more starts to be crazy. HBase has already made some
tradeoffs, but if we keep adding more and more configuration values, its
going to be close to unusable to anyone that doesn't have serious knowledge
about the system and how to configure it.

I would rather make it dead simple for people looking at the main interface
calls (eg. "ok, here is where I add a coprocessor", "here is where I enable
security", "here is where I add a constraint", etc) rather than digging
through the conf and all you have to do is enable the constraintCP or the
secuity CP. Right now you just need to add just it to the list of
regionserver cps, but what if you have to just set a boolean? Lets go super
easy. Heck, security is getting its own module (HBASE-4336), so its
reasonable to think that we can include some configuration specific stuff to
support that.

Ok, clearly the main thread through all of this is I would like to make it
easier to load/unload features.

Constraints was something (a) I thought hbase could use, (b) would be doable
pretty easily with CPs, and (c) would put us down the path of making hbase
easier to run/setup for users. The latter goes for security, constraints,
and other new/existing features.

-Jesse
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB