I am trying to learn " How Kerberos can be implemented in Hadoop ?"
I have gone through this doc
I have also gone through Basic Kerberos stuff (http://web.mit.edu/kerberos/,
After learning from these resources I have come to a conclusion which I am
representing through a diagram.
*Scenario : - User logs on to his computer gets authenticated by Kerberos
Authentication and submits a map reduce job *
(Please read the contents below the diagram it hardly needs 5 minutes of
[image: Inline image 3]
I would like to explain the above diagram and ask questions related with
few steps(highlighted in yellow below)
Numbers in yellow background represents the entire flow (Numbers 1 to 19)
DT (with red background ) represents* Delegation Token*
BAT (with green Background) represents *Block Access Token*
JT (with Brown Background) represents *Job Token*
*Steps 1,2,3 and 4 represents :-*
Request for a TGT (Ticket Granting Ticket)
Request for a service Ticket for Name Node.
Question1) Where should be KDC located ? Can it be on the machine where my
name node or job tracker is present ?
*Steps 5,6,7,8 and 9 represents :-
*Show service ticket to name node , get an Acknowledgement .
Name Node will issue a *Delegation Token* (red)
User will tell about the Token renewer (In this case it is Job Tracker)
Question2) User submits this*Delegation Token* along with the job to Job
Tracker. Will *Delegation Token be shared with Task tracker ?*
Steps 10,11,12,13 and 14 represents:-*
Ask a service ticket for Job tracker , get the service ticket from KDC
Show this ticket to Job Tracker and get an ACK from JobTracker
Submit *Job + Delegation Token* to JobTracker.
*Steps 15,16 and 17 represents:-*
Generate Block Access Token and spread across all Data Nodes.
Send blockID and Block Access Token to Job Tracker and Job Tracker will
pass it on to TaskTracker
Who will ask for the BlockAccessToken and Block ID from the Name Node ?
JobTracker or TaskTracker
Sorry, I missed number 18 by mistake.
Job tracker generates* Job Token* (brown) and passes it to the TaskTrackers.
Can I conclude that there will be one Delegation Token per user which will
be distributed throughout the cluster and
there will be one Job token per job ? So a user will have only one *Delegation
Token* and many Job Tokens(equal to the number of Jobs submitted by him) .
*Please tell me if I missed something or I was wrong at some point in my
Thanks for your help.