I am curious on the thoughts of the community here, this seems like
something many enterprises would drool over with Hive... I am not a coder
so the level coding involved something like this is unknown.
On Sat, May 4, 2013 at 8:31 AM, John Omernik <[EMAIL PROTECTED]> wrote:
> We were doing some tests this past week with hive authorization, one of
> our current use "challenges" is when we have an underlying, well managed
> and partitioned table, and we want to allow access to certain columns in
> that table. Our first thoughts went to VIEWs as that's a common use case
> with Relational Databases, (i.e. setup a view with only the columns you
> want the user to access) and set the permissions appropriately.
> In testing, and this is not surprising given the the "newness" of Hive
> Authorization, a VIEW can not be created as to allow access to to a table
> without granting access to the underlying table, defeating the idea of the
> view as tool to manage that access.
> So I wanted to put to the user group: I've done some JIRA searching and
> didn't find anything (I will admit my JIRA search Foo is not stellar), but
> is there an option that could be thrown together in Hive that would allow
> that use case? Perhaps a configuration setting that would allow views to
> execute as a specific user (perhaps a global user, or perhaps a user
> specified as view creation). This could allow the "view" to have access to
> underlying table, but since the view is created, and it couldn't be changed
> by the user, and thus you could set view "read" permissions to your user or
> group of users you want access.
> I suppose this has challenges "i.e. can a user just create a view to
> bypass table level restrictions? Perhaps if this model was taken, the
> privilege for CREATING/MODIFYING views could be created and granted only to
> a superuser of some sort. I am really just walking through ideas here as
> this is the one last stumbling blocks we have with Hive from an "Enterprise
> ready" point of view. Heck, if done right, you could almost do data masking
> at the view level. You have a column in your source data that is sensitive,
> so instead of returning that column you do a MD5 (can we have a native MD5
> function? :) of that column or you blank that column. If we put in strong
> security on the creation, modification of views, and allow views to execute
> as a different user that has access to source data, you have a powerful way
> to represent your data to all levels within your org.
> Also: Since I am just brain storming here, I'd love to hear what others
> maybe doing around this area. Perhaps the Hive User Community can come up
> with a strategic plan, while at the same time share some shorter term