Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> adding constraints


Copy link to this message
-
Re: adding constraints
>
> There is an example of how to do Constraints as a jar with CPs already
> attached to the ticket, and its pretty simple. However, the ticket goes into
> the plusses and minuses for a top-level or just basic CP based
> implementation.
>
> For me, the best reason for top level is top make HBase easy to use and have
> certain built-in features.

Hmm, I wasn't really reading the two implementation options for
constraints as a choice between a "built-in" feature and CP based.
I'm reading it as a choice between:
1) a bundled CP implementation (which you still have to _enable_) that
does constraint checking loading user classes that implement a simple
interface (Constraint or Predicate<Put> or whatever)
2) an abstract CP example class that you have to extend with your own
implementation logic, which, if you want to do it right, you'll still
wind up with something resembling #1 anyway

FYI, I see option #1 as fairly analogous to the bundled aggregation
client that Lars mentioned.

If you want this as real top-level functionality built directly in to,
say, the HRegion code paths for puts, the question is why should we
add the complexity directly when we have CPs?

> Yeah, we can do security, but you have to include
> the jars make sure it works, etc. As opposed to _certain_ systems where
> security is built in. Similar arguments can be made for things like
> constraints - its just _easier_ to have it built in, and let people use them
> (or not) as they choose.
>

We have a security implementation up for review that provides
meaningful security.  Yes, it has to be enabled to be used and the
process of configuring it could be much simpler.  Security is always a
matter of trade-offs.  You can argue about about whether or not we've
made the right ones.  But the current approach for security was
arrived at as a result of extensive discussions with the entire
community about the right approach, where many concerns were raised
about paying any overhead for security when it was not being used.  As
a result, all security components were built in a loadable fashion,
with the trade-off of some extra configuration complexity.

Yes, Accumulo has "security" always enabled.  But this is still not an
apples-to-apples comparison.  HBase security relies on Kerberos to
provide a trusted third part for strong authentication while never
sending the password over the wire.  Accumulo sends username and
password in plain text on the rpc connections.  As a result HBase
relies on external systems for managing credentials, while Accumulo
embeds its own user database, with the usernames and hashed passwords
stored as globally readable znodes in zookeeper.  You could say that
reliance on an external system makes the HBase setup more complex, but
that's a narrow view.  While managing an internal user database does
keep things self contained, it also forces you to create usernames and
passwords for an application in multiple places (your application does
run under its own account, right?), adding it's own complexity.
Accumulo allows access control labels to be placed on each key value
individually, while HBase uses a simpler model for assignments limited
to table, column family, or column qualifier scope.

Each system makes it's own trade-offs based on its implementation
goals.  What's right for you is going to depend on your needs.  But
the HBase approach did not just disregard simplicity willy-nilly.

> The ticket also talks about abstracting out some of the CP things to make it
> easier to add other top level features, which would be a win too. Yeah, they
> would be backed by CPs, but that doesn't mean it doesn't make sense for
> people to use the stuff really (as in dead simple) easily.
>

Again, I don't really see the other changes discussed (HBASE-4554?) as
top-level vs. CP-based.  I think that change is just about providing
the shell with the ability to easily set arbitrary attributes on
HTableDescriptor.  Those already exist, they're just not properly
exposed in the shell.  Maybe you're envisioning something beyond this
for the constraints case?  That may be good too, but we should
probably move the discussion over to the JIRA.

It may not sound like it, but I'm all in favor of making things as
simple as possible.  It's just that, when simplifying, you're usually
moving complexity from one place to another.  So let's work out where
we can get the biggest benefit.