Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - [DISCUSS] Security Efforts and Branching

Copy link to this message
RE: [DISCUSS] Security Efforts and Branching
Zheng, Kai 2013-09-26, 07:51
Sorry, please kindly allow me to repost this with some cleanup.

Larry, and all

Apologize for not responding sooner. I read your proposals and think about how to collaborate well and speed up things for all of us. From community discussions around the Hadoop Summit, TokenAuth should be a pluggable full stack to accommodate different implementations. HADOOP-9392 reflects that thinking and came up with the breakdown attached in the JIRA. To simplify the discussion I would try to illustrate it here in very high level as follows.

Simply we would have:
TokenAuth = TokenAuth framework + TokenAuth implementation (HAS) + TokenAuth integration

= TokenAuth framework It first defines TokenAuth as the desired pluggable framework that defines and provides required APIs, protocols, flows, and facilities along with common implementations for related constructs, entities and even services. The framework is a subject for continued discussion and defined together as a common effort of the community. It's important that the framework be pluggable in all the key places to allow certain solutions to employ their own product level implementations. Based on this framework, we could build the HAS implementation. Initially, we have the following items to think about to define relevant API and provide core facilities for the framework and the list is to be complemented.
1. Common token definition;
2. TokenAuthn method for Hadoop RPC;
3. Authentication Service;
4. Identity Token Service;
5. Access Token Service;
6. Fine grained authorization;
7. Attribute Service;
8. Token authentication client;
9. Token cache;
10. Common configuration across TokenAuth;
11. Hadoop token command;
12. Key Provider;
13. Web SSO support;
14. REST SSO support;
15. Auditing support.

= TokenAuth implementation (HAS) This defines and implements Hadoop AuthN/AuthZ Server (HAS) based on TokenAuth framework. HAS is a centralized server to address AAA (Authentication, Authorization, Auditing) concerns for Hadoop across the ecosystem. The 'A' of HAS could stand for "Authentication", "Authorization", or "Auditing", depending on which role(s) HAS is provisioned with. HAS is a complete and enterprise ready security solution based on TokenAuth framework and utilizes the common facilities provided by the framework. It customizes and provides all the necessary implementations of constructs, entities, and services defined in the framework that's required by enterprise deployment. Initially we have the following for the implementation:
1. Provide common and management facilities including configuration loading/syncing mechanism, auditing and logging support, shared high availability approach, REST support and so on;
2. Implement Authentication Server role for HAS, implementing Authentication Service, and Identity Token Service defined in the framework. The authentication engine can be configured with a chain of authentication modules to support multi-factor authentication. Particularly, it will support LDAP authentication;
3. Implement Authorization Server role for HAS, implementing Access Token Service;
4. Implement centralized administration for fine-grained authorization for Authorization Server role. Optional in initial iteration;
5. Implement Attribute Service for HAS, to allow integration of third party attribute authorities. Optional in initial iteration.
6. Provides authorization enforcement library for Hadoop services to enforce security policies utilizing related services provided by the Authorization Server. Optional in initial iteration.

= TokenAuth integration This includes tasks that employ TokenAuth framework and relevant implementation(s) to enable related supports for various Hadoop components across the ecosystem for typical enterprise deployments. Currently we have the following in mind:
1. Enable Web SSO flow for web interfaces like HDFS and YARN;
2. Enable REST SSO flow for REST interface like Oozie;
3. Add Thrift and Hive JDBC support using TokenAuth. We consider this support because it is an important interface for enterprise to interact with data;
4. Enable to access Zookeeper using TokenAuth since it's widely used as the coordinator across the ecosystem.

I regard decouple of the pluggable framework from specific implementation as important since we're addressing the similar requirements on the other hand we have different implementation considerations in approaches like the ones represented by HADOOP-9392 and HADOOP-9533. For example, to support pluggable authentication HADOOP-9392 prefers to JAAS based authentication modules but HADOOP-9533 suggests using Apache Shiro. By this decouple we could best collaborate and contribute, as far as I understood, you might agree with this approach as can be seen in your recent email, "decouple the pluggable framework from any specific central server implementation". If I understood you correctly, do you think for the initial iteration we have to have two central servers like HAS server and HSSO server? If not, do you think it works for us to have HAS as a community effort as the TokenAuth framework and we both contribute on the implementation?

To proceed, I would try to align between us, complementing your proposal and addressing your concerns as follows.

= Iteration Endstate Besides what you mentioned from user view, how about adding this consideration:
Additionally, the initial iteration would also lay down the ground TokenAuth framework with fine defined APIs, protocols, flows and core facilities for implementations. The framework should avoid rework and big change for future implementations.

= Terminology and Naming It would be great if we can unify the related terminologies in this effort, at least in the framework level. This could be probably achieved in the process of defining relevant APIs for the TokenAuth framework.

= Project scope It's great we have the common list in scope for the first iteration as you mentioned as follows:
client types: REST, CLI, UI