|
Jesse Yates
2011-10-17, 17:45
Ted Yu
2011-10-17, 18:00
Jesse Yates
2011-10-17, 18:04
Ted Yu
2011-10-17, 18:10
Jesse Yates
2011-10-17, 18:27
lars hofhansl
2011-10-18, 05:00
Jesse Yates
2011-10-18, 05:24
Gary Helmling
2011-10-18, 07:43
Andrew Purtell
2011-10-18, 22:31
Andrew Purtell
2011-10-18, 22:42
Jesse Yates
2011-10-19, 00:49
Gary Helmling
2011-10-19, 02:10
Jesse Yates
2011-10-19, 02:41
Gary Helmling
2011-10-19, 18:18
Jesse Yates
2011-10-19, 18:29
|
-
adding constraintsJesse Yates 2011-10-17, 17:45
Hey everyone,
TL;DR Adding classic DB constraints as a system level coprocessor to help simplify using HBase and ease adopting. Coprocessors are a really powerful mechanism and are incredibly useful for a variety of things. However, I feel like the mechanism for using them can be very daunting and, for certain features, could do with some simplification. What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. If people are interested, I would like to open a Jira on the feature. I've got a basic implementation, but would like to expand it to be a more integrated, top-level element of the code. I just don't want to waste my time doing a full blown impl and then not have at least general concensus on it being a good feature. One of the complaints I commonly hear about HBase is that, to outsiders, it is difficult to figure out and use (though once you do, its solid). This would be a step to make it easier to use and adopt. Thanks, Jesse Yates +
Jesse Yates 2011-10-17, 17:45
-
Re: adding constraintsTed Yu 2011-10-17, 18:00
Jesse:
This is a nice initiative. Looks like the Constraint you define below is per table. Meaning it is not cross-table referential integrity. Cheers On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <[EMAIL PROTECTED]>wrote: > Hey everyone, > > TL;DR Adding classic DB constraints as a system level coprocessor to help > simplify using HBase and ease adopting. > > Coprocessors are a really powerful mechanism and are incredibly useful for > a > variety of things. However, I feel like the mechanism for using them can be > very daunting and, for certain features, could do with some simplification. > > What I would like to propose is a simple interface that people can use to > implement a 'constraint' (matching the classic database definition). This > would help ease of adoption by helping HBase more easily check that box, > help minimize code duplication across organizations, and lead to easier > adoption. > > Essentially, people would implement a 'Constraint' interface for checking > keys before they are put into a table. Puts that are valid get written to > the table, but if not people can will throw an exception that gets > propagated back to the client explaining why the put was invalid. > > Constraints would be set on a per-table basis and the user would be > expected > to ensure the jars containing the constraint are present on the machines > serving that table. > > Yes, people could roll their own mechanism for doing this via coprocessors > each time, but this would make it easier to do so, so you only have to > implement a very minimal interface and not worry about the specifics. > > If people are interested, I would like to open a Jira on the feature. I've > got a basic implementation, but would like to expand it to be a more > integrated, top-level element of the code. I just don't want to waste my > time doing a full blown impl and then not have at least general concensus > on > it being a good feature. > > One of the complaints I commonly hear about HBase is that, to outsiders, it > is difficult to figure out and use (though once you do, its solid). This > would be a step to make it easier to use and adopt. > > Thanks, > Jesse Yates > +
Ted Yu 2011-10-17, 18:00
-
Re: adding constraintsJesse Yates 2011-10-17, 18:04
On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> Jesse: > This is a nice initiative. > Looks like the Constraint you define below is per table. Meaning it is not > cross-table referential integrity. > Theoretically we could support doing this. And if people were really cheeky with the current implementation, they could access other tables to enforce it (though it would kill you on access time). Even so, doing the cross-table checks, is going to be rough on run time (cross-server locking is always bad news bears ;), so thinking this should definitely be a later consideration. > Cheers > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <[EMAIL PROTECTED] > >wrote: > > > Hey everyone, > > > > TL;DR Adding classic DB constraints as a system level coprocessor to help > > simplify using HBase and ease adopting. > > > > Coprocessors are a really powerful mechanism and are incredibly useful > for > > a > > variety of things. However, I feel like the mechanism for using them can > be > > very daunting and, for certain features, could do with some > simplification. > > > > What I would like to propose is a simple interface that people can use to > > implement a 'constraint' (matching the classic database definition). This > > would help ease of adoption by helping HBase more easily check that box, > > help minimize code duplication across organizations, and lead to easier > > adoption. > > > > Essentially, people would implement a 'Constraint' interface for checking > > keys before they are put into a table. Puts that are valid get written to > > the table, but if not people can will throw an exception that gets > > propagated back to the client explaining why the put was invalid. > > > > Constraints would be set on a per-table basis and the user would be > > expected > > to ensure the jars containing the constraint are present on the machines > > serving that table. > > > > Yes, people could roll their own mechanism for doing this via > coprocessors > > each time, but this would make it easier to do so, so you only have to > > implement a very minimal interface and not worry about the specifics. > > > > If people are interested, I would like to open a Jira on the feature. > I've > > got a basic implementation, but would like to expand it to be a more > > integrated, top-level element of the code. I just don't want to waste my > > time doing a full blown impl and then not have at least general concensus > > on > > it being a good feature. > > > > One of the complaints I commonly hear about HBase is that, to outsiders, > it > > is difficult to figure out and use (though once you do, its solid). This > > would be a step to make it easier to use and adopt. > > > > Thanks, > > Jesse Yates > > > +
Jesse Yates 2011-10-17, 18:04
-
Re: adding constraintsTed Yu 2011-10-17, 18:10
Jesse:
I agree with your observations. Constraint, defined for single table, would be useful. Please file a JIRA and describe your strategy there. Thanks On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <[EMAIL PROTECTED]>wrote: > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Jesse: > > This is a nice initiative. > > Looks like the Constraint you define below is per table. Meaning it is > not > > cross-table referential integrity. > > > > Theoretically we could support doing this. And if people were really cheeky > with the current implementation, they could access other tables to enforce > it (though it would kill you on access time). Even so, doing the > cross-table > checks, is going to be rough on run time (cross-server locking is always > bad > news bears ;), so thinking this should definitely be a later consideration. > > > > Cheers > > > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <[EMAIL PROTECTED] > > >wrote: > > > > > Hey everyone, > > > > > > TL;DR Adding classic DB constraints as a system level coprocessor to > help > > > simplify using HBase and ease adopting. > > > > > > Coprocessors are a really powerful mechanism and are incredibly useful > > for > > > a > > > variety of things. However, I feel like the mechanism for using them > can > > be > > > very daunting and, for certain features, could do with some > > simplification. > > > > > > What I would like to propose is a simple interface that people can use > to > > > implement a 'constraint' (matching the classic database definition). > This > > > would help ease of adoption by helping HBase more easily check that > box, > > > help minimize code duplication across organizations, and lead to easier > > > adoption. > > > > > > Essentially, people would implement a 'Constraint' interface for > checking > > > keys before they are put into a table. Puts that are valid get written > to > > > the table, but if not people can will throw an exception that gets > > > propagated back to the client explaining why the put was invalid. > > > > > > Constraints would be set on a per-table basis and the user would be > > > expected > > > to ensure the jars containing the constraint are present on the > machines > > > serving that table. > > > > > > Yes, people could roll their own mechanism for doing this via > > coprocessors > > > each time, but this would make it easier to do so, so you only have to > > > implement a very minimal interface and not worry about the specifics. > > > > > > If people are interested, I would like to open a Jira on the feature. > > I've > > > got a basic implementation, but would like to expand it to be a more > > > integrated, top-level element of the code. I just don't want to waste > my > > > time doing a full blown impl and then not have at least general > concensus > > > on > > > it being a good feature. > > > > > > One of the complaints I commonly hear about HBase is that, to > outsiders, > > it > > > is difficult to figure out and use (though once you do, its solid). > This > > > would be a step to make it easier to use and adopt. > > > > > > Thanks, > > > Jesse Yates > > > > > > +
Ted Yu 2011-10-17, 18:10
-
Re: adding constraintsJesse Yates 2011-10-17, 18:27
Added HBASE-4605 <https://issues.apache.org/jira/browse/HBASE-4605> (and
approach comemnts) for this issue. -Jesse On Mon, Oct 17, 2011 at 11:10 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Jesse: > I agree with your observations. > > Constraint, defined for single table, would be useful. > > Please file a JIRA and describe your strategy there. > > Thanks > > On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <[EMAIL PROTECTED] > >wrote: > > > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Jesse: > > > This is a nice initiative. > > > Looks like the Constraint you define below is per table. Meaning it is > > not > > > cross-table referential integrity. > > > > > > > Theoretically we could support doing this. And if people were really > cheeky > > with the current implementation, they could access other tables to > enforce > > it (though it would kill you on access time). Even so, doing the > > cross-table > > checks, is going to be rough on run time (cross-server locking is always > > bad > > news bears ;), so thinking this should definitely be a later > consideration. > > > > > > > Cheers > > > > > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Hey everyone, > > > > > > > > TL;DR Adding classic DB constraints as a system level coprocessor to > > help > > > > simplify using HBase and ease adopting. > > > > > > > > Coprocessors are a really powerful mechanism and are incredibly > useful > > > for > > > > a > > > > variety of things. However, I feel like the mechanism for using them > > can > > > be > > > > very daunting and, for certain features, could do with some > > > simplification. > > > > > > > > What I would like to propose is a simple interface that people can > use > > to > > > > implement a 'constraint' (matching the classic database definition). > > This > > > > would help ease of adoption by helping HBase more easily check that > > box, > > > > help minimize code duplication across organizations, and lead to > easier > > > > adoption. > > > > > > > > Essentially, people would implement a 'Constraint' interface for > > checking > > > > keys before they are put into a table. Puts that are valid get > written > > to > > > > the table, but if not people can will throw an exception that gets > > > > propagated back to the client explaining why the put was invalid. > > > > > > > > Constraints would be set on a per-table basis and the user would be > > > > expected > > > > to ensure the jars containing the constraint are present on the > > machines > > > > serving that table. > > > > > > > > Yes, people could roll their own mechanism for doing this via > > > coprocessors > > > > each time, but this would make it easier to do so, so you only have > to > > > > implement a very minimal interface and not worry about the specifics. > > > > > > > > If people are interested, I would like to open a Jira on the feature. > > > I've > > > > got a basic implementation, but would like to expand it to be a more > > > > integrated, top-level element of the code. I just don't want to waste > > my > > > > time doing a full blown impl and then not have at least general > > concensus > > > > on > > > > it being a good feature. > > > > > > > > One of the complaints I commonly hear about HBase is that, to > > outsiders, > > > it > > > > is difficult to figure out and use (though once you do, its solid). > > This > > > > would be a step to make it easier to use and adopt. > > > > > > > > Thanks, > > > > Jesse Yates > > > > > > > > > > +
Jesse Yates 2011-10-17, 18:27
-
Re: adding constraintslars hofhansl 2011-10-18, 05:00
My $0.02...
I'd rather include an example of how to do this with a coprocessors (similar to what we do with the aggregation client), rather than a new HBase feature. If the example is easy to extend and to compile to a jar we have achieved almost the same. Also - as an anecdote - every semi large relational database I worked with professionally had constraints turned because of performance reasons and rather implemented constraints at the application layer. -- Lars ________________________________ From: Jesse Yates <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, October 17, 2011 11:27 AM Subject: Re: adding constraints Added HBASE-4605 <https://issues.apache.org/jira/browse/HBASE-4605> (and approach comemnts) for this issue. -Jesse On Mon, Oct 17, 2011 at 11:10 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Jesse: > I agree with your observations. > > Constraint, defined for single table, would be useful. > > Please file a JIRA and describe your strategy there. > > Thanks > > On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <[EMAIL PROTECTED] > >wrote: > > > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Jesse: > > > This is a nice initiative. > > > Looks like the Constraint you define below is per table. Meaning it is > > not > > > cross-table referential integrity. > > > > > > > Theoretically we could support doing this. And if people were really > cheeky > > with the current implementation, they could access other tables to > enforce > > it (though it would kill you on access time). Even so, doing the > > cross-table > > checks, is going to be rough on run time (cross-server locking is always > > bad > > news bears ;), so thinking this should definitely be a later > consideration. > > > > > > > Cheers > > > > > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates <[EMAIL PROTECTED] > > > >wrote: > > > > > > > Hey everyone, > > > > > > > > TL;DR Adding classic DB constraints as a system level coprocessor to > > help > > > > simplify using HBase and ease adopting. > > > > > > > > Coprocessors are a really powerful mechanism and are incredibly > useful > > > for > > > > a > > > > variety of things. However, I feel like the mechanism for using them > > can > > > be > > > > very daunting and, for certain features, could do with some > > > simplification. > > > > > > > > What I would like to propose is a simple interface that people can > use > > to > > > > implement a 'constraint' (matching the classic database definition). > > This > > > > would help ease of adoption by helping HBase more easily check that > > box, > > > > help minimize code duplication across organizations, and lead to > easier > > > > adoption. > > > > > > > > Essentially, people would implement a 'Constraint' interface for > > checking > > > > keys before they are put into a table. Puts that are valid get > written > > to > > > > the table, but if not people can will throw an exception that gets > > > > propagated back to the client explaining why the put was invalid. > > > > > > > > Constraints would be set on a per-table basis and the user would be > > > > expected > > > > to ensure the jars containing the constraint are present on the > > machines > > > > serving that table. > > > > > > > > Yes, people could roll their own mechanism for doing this via > > > coprocessors > > > > each time, but this would make it easier to do so, so you only have > to > > > > implement a very minimal interface and not worry about the specifics. > > > > > > > > If people are interested, I would like to open a Jira on the feature. > > > I've > > > > got a basic implementation, but would like to expand it to be a more > > > > integrated, top-level element of the code. I just don't want to waste > > my > > > > time doing a full blown impl and then not have at least general > > concensus > > > > on > > > > it being a good feature. > > > > > > > > One of the complaints I commonly hear about HBase is that, to > > outsiders, +
lars hofhansl 2011-10-18, 05:00
-
Re: adding constraintsJesse Yates 2011-10-18, 05:24
Yeah, in many large installations, turning off constraints makes a lot of
sense (do checking before you put the data over the wire, rather than server side). However, on multi-tenant systems or where you are required to enforce certain parameters (constraints) on the data no matter what, due to company policy or w/e. There is an example of how to do Constraints as a jar with CPs already attached to the ticket, and its pretty simple. However, the ticket goes into the plusses and minuses for a top-level or just basic CP based implementation. For me, the best reason for top level is top make HBase easy to use and have certain built-in features. Yeah, we can do security, but you have to include the jars make sure it works, etc. As opposed to _certain_ systems where security is built in. Similar arguments can be made for things like constraints - its just _easier_ to have it built in, and let people use them (or not) as they choose. The ticket also talks about abstracting out some of the CP things to make it easier to add other top level features, which would be a win too. Yeah, they would be backed by CPs, but that doesn't mean it doesn't make sense for people to use the stuff really (as in dead simple) easily. -Jesse On Mon, Oct 17, 2011 at 10:00 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > My $0.02... > > > I'd rather include an example of how to do this with a coprocessors > (similar to what we do with the > aggregation client), rather than a new HBase feature. If the example is > easy to extend and to compile to a > jar we have achieved almost the same. > > > Also - as an anecdote - every semi large relational database I worked with > professionally had constraints turned because > of performance reasons and rather implemented constraints at the > application layer. > > > -- Lars > > ________________________________ > From: Jesse Yates <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, October 17, 2011 11:27 AM > Subject: Re: adding constraints > > Added HBASE-4605 <https://issues.apache.org/jira/browse/HBASE-4605> (and > approach comemnts) for this issue. > > -Jesse > > On Mon, Oct 17, 2011 at 11:10 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Jesse: > > I agree with your observations. > > > > Constraint, defined for single table, would be useful. > > > > Please file a JIRA and describe your strategy there. > > > > Thanks > > > > On Mon, Oct 17, 2011 at 11:04 AM, Jesse Yates <[EMAIL PROTECTED] > > >wrote: > > > > > On Mon, Oct 17, 2011 at 11:00 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > Jesse: > > > > This is a nice initiative. > > > > Looks like the Constraint you define below is per table. Meaning it > is > > > not > > > > cross-table referential integrity. > > > > > > > > > > Theoretically we could support doing this. And if people were really > > cheeky > > > with the current implementation, they could access other tables to > > enforce > > > it (though it would kill you on access time). Even so, doing the > > > cross-table > > > checks, is going to be rough on run time (cross-server locking is > always > > > bad > > > news bears ;), so thinking this should definitely be a later > > consideration. > > > > > > > > > > Cheers > > > > > > > > On Mon, Oct 17, 2011 at 10:45 AM, Jesse Yates < > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > Hey everyone, > > > > > > > > > > TL;DR Adding classic DB constraints as a system level coprocessor > to > > > help > > > > > simplify using HBase and ease adopting. > > > > > > > > > > Coprocessors are a really powerful mechanism and are incredibly > > useful > > > > for > > > > > a > > > > > variety of things. However, I feel like the mechanism for using > them > > > can > > > > be > > > > > very daunting and, for certain features, could do with some > > > > simplification. > > > > > > > > > > What I would like to propose is a simple interface that people can > > use > > > to > > > > > implement a 'constraint' (matching the classic database +
Jesse Yates 2011-10-18, 05:24
-
Re: adding constraintsGary Helmling 2011-10-18, 07:43
>
> There is an example of how to do Constraints as a jar with CPs already > attached to the ticket, and its pretty simple. However, the ticket goes into > the plusses and minuses for a top-level or just basic CP based > implementation. > > For me, the best reason for top level is top make HBase easy to use and have > certain built-in features. Hmm, I wasn't really reading the two implementation options for constraints as a choice between a "built-in" feature and CP based. I'm reading it as a choice between: 1) a bundled CP implementation (which you still have to _enable_) that does constraint checking loading user classes that implement a simple interface (Constraint or Predicate<Put> or whatever) 2) an abstract CP example class that you have to extend with your own implementation logic, which, if you want to do it right, you'll still wind up with something resembling #1 anyway FYI, I see option #1 as fairly analogous to the bundled aggregation client that Lars mentioned. If you want this as real top-level functionality built directly in to, say, the HRegion code paths for puts, the question is why should we add the complexity directly when we have CPs? > Yeah, we can do security, but you have to include > the jars make sure it works, etc. As opposed to _certain_ systems where > security is built in. Similar arguments can be made for things like > constraints - its just _easier_ to have it built in, and let people use them > (or not) as they choose. > We have a security implementation up for review that provides meaningful security. Yes, it has to be enabled to be used and the process of configuring it could be much simpler. Security is always a matter of trade-offs. You can argue about about whether or not we've made the right ones. But the current approach for security was arrived at as a result of extensive discussions with the entire community about the right approach, where many concerns were raised about paying any overhead for security when it was not being used. As a result, all security components were built in a loadable fashion, with the trade-off of some extra configuration complexity. Yes, Accumulo has "security" always enabled. But this is still not an apples-to-apples comparison. HBase security relies on Kerberos to provide a trusted third part for strong authentication while never sending the password over the wire. Accumulo sends username and password in plain text on the rpc connections. As a result HBase relies on external systems for managing credentials, while Accumulo embeds its own user database, with the usernames and hashed passwords stored as globally readable znodes in zookeeper. You could say that reliance on an external system makes the HBase setup more complex, but that's a narrow view. While managing an internal user database does keep things self contained, it also forces you to create usernames and passwords for an application in multiple places (your application does run under its own account, right?), adding it's own complexity. Accumulo allows access control labels to be placed on each key value individually, while HBase uses a simpler model for assignments limited to table, column family, or column qualifier scope. Each system makes it's own trade-offs based on its implementation goals. What's right for you is going to depend on your needs. But the HBase approach did not just disregard simplicity willy-nilly. > The ticket also talks about abstracting out some of the CP things to make it > easier to add other top level features, which would be a win too. Yeah, they > would be backed by CPs, but that doesn't mean it doesn't make sense for > people to use the stuff really (as in dead simple) easily. > Again, I don't really see the other changes discussed (HBASE-4554?) as top-level vs. CP-based. I think that change is just about providing the shell with the ability to easily set arbitrary attributes on HTableDescriptor. Those already exist, they're just not properly exposed in the shell. Maybe you're envisioning something beyond this for the constraints case? That may be good too, but we should probably move the discussion over to the JIRA. It may not sound like it, but I'm all in favor of making things as simple as possible. It's just that, when simplifying, you're usually moving complexity from one place to another. So let's work out where we can get the biggest benefit. +
Gary Helmling 2011-10-18, 07:43
-
Re: adding constraintsAndrew Purtell 2011-10-18, 22:31
> > Yeah, we can do security, but you have to include
> > the jars make sure it works, etc. As opposed to _certain_ systems where > > security is built in. Similar arguments can be made for things like > > constraints - its just _easier_ to have it built in, and let people use them > > (or not) as they choose. > [...] But the current approach for security was > arrived at as a result of extensive discussions with the entire > community about the right approach, where many concerns were raised > about paying any overhead for security when it was not being used. As > a result, all security components were built in a loadable fashion, > with the trade-off of some extra configuration complexity This discussion more than casually reminds me of past discussions regarding moving from a statically linked kernel to one that supports dynamically loaded modules seen on both the Linux and FreeBSD mailing lists. Again we have a tightly coupled code base making a transition to dynamic runtime composition. IMO, anyone concerned that HBase doesn't have security or constraints built in can ship a default configuration that has either or both loaded as system coprocessors. Those that don't want the "bloat" can simply not load them. This balances the demands we will see over a contiuum here, from those that want the most functionality "out of the box", to those that want maximum performance or minimal runtime complexity or both. If there is sufficient concern about user-friendliness, those so concerned could build the plumbing to automatically load coprocessors a la modprobe. Perhaps by reading hbase-site.xml and matching config vars to CP jars (via reflection and some kind of decorator convention?). I also see HBASE-4554 as about improving how CPs get configuration, if needed, and how the user can change it. It looks like everyone is in favor of, or at least does not object to, some sort of constraint checking and enforcement implemented as a coprocessor, independent of the core code. Personally, I have the same attitude about this as I did security -- it's great to have, and even better if it can be dynamically loaded only as needed so those that do not want it suffer no overheads or performance degradation. - Andy >________________________________ >From: Gary Helmling <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Tuesday, October 18, 2011 12:43 AM >Subject: Re: adding constraints > >> >> There is an example of how to do Constraints as a jar with CPs already >> attached to the ticket, and its pretty simple. However, the ticket goes into >> the plusses and minuses for a top-level or just basic CP based >> implementation. >> >> For me, the best reason for top level is top make HBase easy to use and have >> certain built-in features. > >Hmm, I wasn't really reading the two implementation options for >constraints as a choice between a "built-in" feature and CP based. >I'm reading it as a choice between: >1) a bundled CP implementation (which you still have to _enable_) that >does constraint checking loading user classes that implement a simple >interface (Constraint or Predicate<Put> or whatever) >2) an abstract CP example class that you have to extend with your own >implementation logic, which, if you want to do it right, you'll still >wind up with something resembling #1 anyway > >FYI, I see option #1 as fairly analogous to the bundled aggregation >client that Lars mentioned. > >If you want this as real top-level functionality built directly in to, >say, the HRegion code paths for puts, the question is why should we >add the complexity directly when we have CPs? > >> Yeah, we can do security, but you have to include >> the jars make sure it works, etc. As opposed to _certain_ systems where >> security is built in. Similar arguments can be made for things like >> constraints - its just _easier_ to have it built in, and let people use them >> (or not) as they choose. >> > >We have a security implementation up for review that provides +
Andrew Purtell 2011-10-18, 22:31
-
Re: adding constraintsAndrew Purtell 2011-10-18, 22:42
Sorry, I meant to refer to HBASE-4605.
I was here, kind of, a long time ago with HBASE-2395. I closed that as a duplicate of HBASE-4605 because what Jesse wrote is clearer. - Andy >________________________________ >From: Andrew Purtell <[EMAIL PROTECTED]> >To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >Sent: Tuesday, October 18, 2011 3:31 PM >Subject: Re: adding constraints > >> > Yeah, we can do security, but you have to include >> > the jars make sure it works, etc. As opposed to _certain_ systems where >> > security is built in. Similar arguments can be made for things like >> > constraints - its just _easier_ to have it built in, and let people use them >> > (or not) as they choose. > >> [...] But the current approach for security was >> arrived at as a result of extensive discussions with the entire >> community about the right approach, where many concerns were raised >> about paying any overhead for security when it was not being used. As >> a result, all security components were built in a loadable fashion, >> with the trade-off of some extra configuration complexity > >This discussion more than casually reminds me of past discussions regarding moving from a statically linked kernel to one that supports dynamically loaded modules seen on both the Linux and FreeBSD mailing lists. > >Again we have a tightly coupled code base making a transition to dynamic runtime composition. > >IMO, anyone concerned that HBase doesn't have security or constraints built in can ship a default configuration that has either or both loaded as system coprocessors. Those that don't want the "bloat" can simply not load them. This balances the demands we will see over a contiuum here, from those that want the most functionality "out of the box", to those that want maximum performance or minimal runtime complexity or both. > >If there is sufficient concern about user-friendliness, those so concerned could build the plumbing to automatically load coprocessors a la modprobe. Perhaps by reading hbase-site.xml and matching config vars to CP jars (via reflection and some kind of decorator convention?). > >I also see HBASE-4554 as about improving how CPs get configuration, if needed, and how the user can change it. > >It looks like everyone is in favor of, or at least does not object to, some sort of constraint checking and enforcement implemented as a coprocessor, independent of the core code. > >Personally, I have the same attitude about this as I did security -- it's great to have, and even better if it can be dynamically loaded only as needed so those that do not want it suffer no overheads or performance degradation. > > - Andy > > > > > >>________________________________ >>From: Gary Helmling <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Tuesday, October 18, 2011 12:43 AM >>Subject: Re: adding constraints >> >>> >>> There is an example of how to do Constraints as a jar with CPs already >>> attached to the ticket, and its pretty simple. However, the ticket goes into >>> the plusses and minuses for a top-level or just basic CP based >>> implementation. >>> >>> For me, the best reason for top level is top make HBase easy to use and have >>> certain built-in features. >> >>Hmm, I wasn't really reading the two implementation options for >>constraints as a choice between a "built-in" feature and CP based. >>I'm reading it as a choice between: >>1) a bundled CP implementation (which you still have to _enable_) that >>does constraint checking loading user classes that implement a simple >>interface (Constraint or Predicate<Put> or whatever) >>2) an abstract CP example class that you have to extend with your own >>implementation logic, which, if you want to do it right, you'll still >>wind up with something resembling #1 anyway >> >>FYI, I see option #1 as fairly analogous to the bundled aggregation >>client that Lars mentioned. >> >>If you want this as real top-level functionality built directly in to, >>say, the HRegion code paths for puts, the question is why should we +
Andrew Purtell 2011-10-18, 22:42
-
Re: adding constraintsJesse Yates 2011-10-19, 00:49
Comments inline.
On Tue, Oct 18, 2011 at 12:43 AM, Gary Helmling <[EMAIL PROTECTED]> wrote: > > > > There is an example of how to do Constraints as a jar with CPs already > > attached to the ticket, and its pretty simple. However, the ticket goes > into > > the plusses and minuses for a top-level or just basic CP based > > implementation. > > > > For me, the best reason for top level is top make HBase easy to use and > have > > certain built-in features. > > Hmm, I wasn't really reading the two implementation options for > constraints as a choice between a "built-in" feature and CP based. > Either way it would be CP based, but the 'built-in' would just have some 'nice' ways of adding things. In short, its a question of adding a method to the HTD for addConstraint() to add a bunch of classes to be run by the 'constraint CP'. I could theoretically see a situation where people would want to have the constraint extend from some other class (due to legacy code), meaning extending an existing CP is a little more of a pain. So, yeah, it still looks the #1 (below), but its easier to use. And if you don't want to enable constraints, don't add the constraint jar as the CP list - no runtime slowdown and its still a bit similar to how security is done. > I'm reading it as a choice between: > 1) a bundled CP implementation (which you still have to _enable_) that > does constraint checking loading user classes that implement a simple > interface (Constraint or Predicate<Put> or whatever) > 2) an abstract CP example class that you have to extend with your own > implementation logic, which, if you want to do it right, you'll still > wind up with something resembling #1 anyway > > FYI, I see option #1 as fairly analogous to the bundled aggregation > client that Lars mentioned. > > If you want this as real top-level functionality built directly in to, > say, the HRegion code paths for puts, the question is why should we > add the complexity directly when we have CPs? > I feel like having the addConstraint() for a table is actually _less_ complexity. Not necessarily from the overall system perspective certainly (you have to do a little abstraction and a couple more methods), but its not that much more as it all centered around the HTD. > > > Yeah, we can do security, but you have to include > > the jars make sure it works, etc. As opposed to _certain_ systems where > > security is built in. Similar arguments can be made for things like > > constraints - its just _easier_ to have it built in, and let people use > them > > (or not) as they choose. > > > > We have a security implementation up for review that provides > meaningful security. Yes, it has to be enabled to be used and the > process of configuring it could be much simpler. Security is always a > matter of trade-offs. You can argue about about whether or not we've > made the right ones. But the current approach for security was > arrived at as a result of extensive discussions with the entire > community about the right approach, where many concerns were raised > about paying any overhead for security when it was not being used. As > a result, all security components were built in a loadable fashion, > with the trade-off of some extra configuration complexity. > > Yes, Accumulo has "security" always enabled. But this is still not an > apples-to-apples comparison. HBase security relies on Kerberos to > provide a trusted third part for strong authentication while never > sending the password over the wire. Accumulo sends username and > password in plain text on the rpc connections. As a result HBase > relies on external systems for managing credentials, while Accumulo > embeds its own user database, with the usernames and hashed passwords > stored as globally readable znodes in zookeeper. You could say that > reliance on an external system makes the HBase setup more complex, but > that's a narrow view. While managing an internal user database does > keep things self contained, it also forces you to create usernames and Sorry for bringing security up flippantly - clearly you guys have thought about that a lot and i wasn't trying to imply that you hadn't. Yeah, Accumulo has a different model (and clearly has its flaws) and is running in an, arguably, very different environment (with different requirements) than most people running hbase. I think it makes sense to not have security impact performance by making it loadable. However, loading should be easy. What I'm concerned about is the configuration complexity - there are a ton of them and adding more starts to be crazy. HBase has already made some tradeoffs, but if we keep adding more and more configuration values, its going to be close to unusable to anyone that doesn't have serious knowledge about the system and how to configure it. I would rather make it dead simple for people looking at the main interface calls (eg. "ok, here is where I add a coprocessor", "here is where I enable security", "here is where I add a constraint", etc) rather than digging through the conf and all you have to do is enable the constraintCP or the secuity CP. Right now you just need to add just it to the list of regionserver cps, but what if you have to just set a boolean? Lets go super easy. Heck, security is getting its own module (HBASE-4336), so its reasonable to think that we can include some configuration specific stuff to support that. Ok, clearly the main thread through all of this is I would like to make it easier to load/unload features. Constraints was something (a) I thought hbase could use, (b) would be doable pretty easily with CPs, and (c) would put us down the path of making hbase easier to run/setup for users. The latter goes for security, constraints, and other new/existing features. -Jesse +
Jesse Yates 2011-10-19, 00:49
-
Re: adding constraintsGary Helmling 2011-10-19, 02:10
>>
>> Hmm, I wasn't really reading the two implementation options for >> constraints as a choice between a "built-in" feature and CP based. >> > > Either way it would be CP based, but the 'built-in' would just have some > 'nice' ways of adding things. In short, its a question of adding a method to > the HTD for addConstraint() to add a bunch of classes to be run by the > 'constraint CP'. > I think we're on the same page here (just the details to work out). But I think for most people on this list, saying "top level" or "built in" feature would imply something not CP based, so we should be careful about terminology. > > I feel like having the addConstraint() for a table is actually _less_ > complexity. Not necessarily from the overall system perspective certainly > (you have to do a little abstraction and a couple more methods), but its not > that much more as it all centered around the HTD. > For a single case, yes, this is simpler. But it shifts complexity from the exposed configuration into the HTD code. What happens when we have 20 such cases? HTD starts to become a bit of a mess with special casing for each. I totally understand the motivation -- we did something similar with table "owners" in the patch for HBASE-3025. But I'm starting to think we need to handle it differently there and here to keep things scalable. I think we need to invert this, so that CPs can take ownership for adding their own configs to HTD, instead of making HTD take ownership for all. Something like: HTableDescriptor htd = new HTableDescriptor(...); Constraints.add(htd, MyConstraintImpl.class); admin.createTable(htd); I think this is the best way to keep the code extensibility scalable. We'd have to work out how exactly this integrates with the HBase shell. But given that jruby gives us a dynamic language to work with, we should be able to figure something out. I think making the shell more extensible is also an important part of this. For HBASE-3025 we needed to add some shell commands, and there's not really a "loadable" way of doing so at the moment. > > What I'm concerned about is the configuration complexity - there are a ton > of them and adding more starts to be crazy. HBase has already made some > tradeoffs, but if we keep adding more and more configuration values, its > going to be close to unusable to anyone that doesn't have serious knowledge > about the system and how to configure it. > I completely agree that security has a long way to go here. Some configuration has to be there -- we need to know the principals for the various services, keytab files for logins -- but the rest of the config for the loadable security bits should really be just a single setting. I totally agree with the vision here. We'll get there. > > Ok, clearly the main thread through all of this is I would like to make it > easier to load/unload features. > > Constraints was something (a) I thought hbase could use, (b) would be doable > pretty easily with CPs, and (c) would put us down the path of making hbase > easier to run/setup for users. The latter goes for security, constraints, > and other new/existing features. > Agree with all of this, and I appreciate that you're looking at improving this stuff. Configuration and operability is a critical part of the user experience and we have a long way to go in streamlining it. --gh +
Gary Helmling 2011-10-19, 02:10
-
Re: adding constraintsJesse Yates 2011-10-19, 02:41
On Tue, Oct 18, 2011 at 7:10 PM, Gary Helmling <[EMAIL PROTECTED]> wrote:
> >> > >> Hmm, I wasn't really reading the two implementation options for > >> constraints as a choice between a "built-in" feature and CP based. > >> > > > > Either way it would be CP based, but the 'built-in' would just have some > > 'nice' ways of adding things. In short, its a question of adding a method > to > > the HTD for addConstraint() to add a bunch of classes to be run by the > > 'constraint CP'. > > > > I think we're on the same page here (just the details to work out). > But I think for most people on this list, saying "top level" or "built > in" feature would imply something not CP based, so we should be > careful about terminology. > Agreed. > > > > > I feel like having the addConstraint() for a table is actually _less_ > > complexity. Not necessarily from the overall system perspective certainly > > (you have to do a little abstraction and a couple more methods), but its > not > > that much more as it all centered around the HTD. > > > > For a single case, yes, this is simpler. But it shifts complexity > from the exposed configuration into the HTD code. What happens when > we have 20 such cases? HTD starts to become a bit of a mess with > special casing for each. I totally understand the motivation -- we > did something similar with table "owners" in the patch for HBASE-3025. > But I'm starting to think we need to handle it differently there and > here to keep things scalable. > Yeah, this can start to be a bit of a mess. > > I think we need to invert this, so that CPs can take ownership for > adding their own configs to HTD, instead of making HTD take ownership > for all. Something like: > > HTableDescriptor htd = new HTableDescriptor(...); > Constraints.add(htd, MyConstraintImpl.class); > admin.createTable(htd); > +1 I like the idea. It also feels very 'hadoopy' (eg. input/output formats) > > I think this is the best way to keep the code extensibility scalable. > > We'd have to work out how exactly this integrates with the HBase > shell. But given that jruby gives us a dynamic language to work with, > we should be able to figure something out. I think making the shell > more extensible is also an important part of this. For HBASE-3025 we > needed to add some shell commands, and there's not really a "loadable" > way of doing so at the moment. > It will be interesting to see how that feature shakes out Do you think you can put the core what we are agreeing on into 4605? I want to make sure we don't lose any of your comments Thanks, Jesse +
Jesse Yates 2011-10-19, 02:41
-
Re: adding constraintsGary Helmling 2011-10-19, 18:18
>
> Do you think you can put the core what we are agreeing on into 4605? I want > to make sure we don't lose any of your comments > Sure I'll try to summarize in a comment on 4605. I think we'll need to open a new JIRA for the shell aspects of this as well, since it looks like 4554 is only handling directly setting a coprocessor and we really need something more general. +
Gary Helmling 2011-10-19, 18:18
-
Re: adding constraintsJesse Yates 2011-10-19, 18:29
Thanks!
I think 4605 may need to be pulled under a blanket ticket for doing the improvements like setting general properties/dynamic loading of modules. -Jesse On Wed, Oct 19, 2011 at 11:18 AM, Gary Helmling <[EMAIL PROTECTED]> wrote: > > > > Do you think you can put the core what we are agreeing on into 4605? I > want > > to make sure we don't lose any of your comments > > > > Sure I'll try to summarize in a comment on 4605. > > I think we'll need to open a new JIRA for the shell aspects of this as > well, since it looks like 4554 is only handling directly setting a > coprocessor and we really need something more general. > +
Jesse Yates 2011-10-19, 18:29
|