Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Re: "NOT" operator in visibility string


Copy link to this message
-
Re: "NOT" operator in visibility string
It’d be nice to perhaps have lots of use cases documented. I suspect there are more cases that could be added to this list.

How to label data has always been very flexible, but as a result has proven somewhat difficult to decide in some cases. The original design and intent of security labels was to allow users to express certain attributes of the data, such as sensitivity or source etc.

These attributes are not the only thing that needs to be considered when deciding to grant read access. Other things, like the roles and responsibilities of users are handled outside the security labeling system, in an external application that maps users to roles for example. These aspects of security can change over time and as such they are typically managed in an external mutable system and are not stored in the security label.

I think users can get into trouble when they begin to push attributes that aren’t specific to the data into the security label. This is trouble because attributes of the data are not likely to change over time, and so make sense to store in a system of immutable but versioned data like Accumulo. ‘Rewriting' data labels (i.e. writing a new version of all the data) when there is a lot of data is non-trivial.
As for #3 and #4 below ...

On Mar 10, 2014, at 2:07 PM, Mike Drob <[EMAIL PROTECTED]> wrote:
I believe this case can be handled today (i.e. without NOT) by removing the assignment of secret and top-secret for the user on probation in whatever external system manages those assignments. Whether a user is on probation is temporary and not specific to the data and so is not appropriate information to stored in a security label.
This case is somewhat strange since users doubtless have access to their own files at other times and for other purposes (i.e. when creating the files), but just not for the purposes of review. Such access controls are also not a function of the data, but rather the purpose of the application.

In this case, the external system managing assignments of users to security tokens could simply be configured to grant the tokens representing access to everyone else’s files to a user and not their own. This is not hard because the number of users is known and is available in the external token assignment system.

In other applications, such as those in which users create and manage their own files, the external assignment system would simply assign to users the tokens to access their own files.

I don’t see a case for having a NOT operator as part of the security label being made in these cases.

———

Here is a difficult use case, known as the ‘ethical wall’ or some other names, which is intended to prevent conflicts of interest:

Users can access any one type of data, but once they have accessed one type they cannot access any other type.

For example, once you have access to the audit logs, you cannot access the primary data set, and vice versa. Or say a researcher can see one company’s information, but once she has seen that information, she can’t see any other companies’ information, or else risk a conflict of interest.

If for some weird reason users were the ones to pick which one data set they got to see, they would have to be excluded from all other data sets after that point.

Again, this information - which data sets has a user ever seen - is not specific to the data and so should not be stored in the security label. Rather, an application can be written to keep track of what datasets a user has seen if any. If the user has never done a query, she can query any data set once. After that, she can only issue queries to that same data set. The application can simply keep track of which data set each user chose and assign security tokens appropriately.

———

Another tough use case is around combining data. Some types of data sensitivities change when combined with other data. This is hard to capture in a security label since each label applies to exactly one key-value pair.

In this case you’d need perhaps an iterator or logic in an application that tracks how many elements are being accessed at once and that can apply rules for increasing or decreasing the sensitivity of the entire result set accordingly. It would be interesting to use an iterator to do this since you could have one iterator adjusting security levels based on the combined data, and then reapply the same old security filtering logic to the newly derived aggregate labels.

For example, users might be able to do point queries (i.e. specify an entire key so that only a single value is returned) and labels could be used as always.

But then a user might fetch a whole row, say using the WholeRowIterator, which might have a higher or lower sensitivity level than any one data element. It would be nice to be able to present a single key value pair containing the entire row, and to have a security label that described the sensitivity of that whole row.

Similarly, some iterators summarize data which might result in the data being more or less sensitive.

———

In none of the previous four cases do I see a need for NOT to be implemented. But I’d like to hear what other use cases people are looking at.
Aaron