Xiao Yu 2013-03-21, 22:42
Harsh J 2013-03-22, 11:30
Xiao Yu 2013-03-22, 15:04
-Re: In the constructor of JobInProgress, why is it safe to call FileSyste.closeAllForUGI().
Devaraj Das 2013-03-29, 15:08
Hey Xiao, iirc, the ugi will be different for the second ec2-user's job
submission. So they should not interfere...
On Mar 22, 2013 8:05 AM, "Xiao Yu" <[EMAIL PROTECTED]> wrote:
> Thanks a lot for the reply.
> I have made several changes here and there, but I do not think they are
> quite relevant (the most relevant modification on the control flow of JIP
> construction is at the end of JobInProgress$initTasks() where I add all the
> map tasks of this job into a data structure in JT I maintained by my
> code)....May I describe a case that I think might be a problem and you can
> help me identify if it is a bug or I am not using hadoop correctly.
> 1. In my environment, there is only one user, ec2-user, which is the owner
> of JT, the submitter of all the jobs, the owner of HDFS, etc....
> (1) job1 submitted and its JIP jip1 constructed successfully
> (2) jip1 starts its initTasks() function and inside that function it calls
> generateAndStoreTokens() which calls tokenStorage.writeTokenStorageFile(),
> which opens a JobToken file, jobToken1, write and not close it yet. This
> file is opend with ownership ec2-user
> (3) job2 submitted and inside its finally clause of constructor, it calls
> FileSystem.closeAllForUGI('ec2-user'), inside which it calls
> (4) in version 1.1.0 and before, dfsClient has a leaseChecker object that
> maintains all the writing files and their outputstream, dfsClient.close()
> calls leaseChecker.close() which pops out the files one by one in a
> synchronized block but close the related outputstream outside the block. So
> in jip2 constructor, FileSystem.closeAllForUGI('ec2-user'), it
> * and its outputstream, but not close it yet.
> (5) the outputstream from (2) of jip1 close its stream of jobToken1
> (6) the outputstream from (4) of jip2 tries to closes its stream of
> jobToken1 but fails with a null pointer exception.
> 3. Even if the leaseChecker object is not used in later version, I still do
> not understand why it is safe to call closeAllForUGI('ec2-user') in job2
> constructor when job1 could write a jobToken file in initTasks() also with
> ownership 'ec2-user'.
> Xiao Yu
> On Fri, Mar 22, 2013 at 7:30 AM, Harsh J <[EMAIL PROTECTED]> wrote:
> > The current user (UserGroupInformation.getCurrentUser()) is the user
> > active in the RPC call thats invoking these functions, and not the JT
> > user exactly.
> > However, given that the JIP construction is outside of a synchronized
> > step and can potentially happen in parallel with another JIP request,
> > it is possible that you may have identified a possible bug here.
> > I've not seen this happen though, even at high loads of submits from a
> > single user (where I think this could happen). Can you detail your
> > changes, cause it could be somewhat related to that as well? The UGI
> > compare inside of closeAllForUGI is probably protective enough but
> > it'd still be worth looking into.
> > On Fri, Mar 22, 2013 at 4:12 AM, Xiao Yu <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > This might be a naive question, but I am having a difficult time to
> > > understand it. At the end of the constructor of JobInProgress, in the
> > > finally clause, the code calls
> > > FileSystem.closeAllForUGI(UserGroupInformation.getCurrentUser()), but
> > > is it safe.
> > >
> > > My concern is that the current user is the owner of jobtracker, so it
> > will
> > > close all the files the jobtracker is writing, such as a jobtoken file
> > > another jip is currently writing.
> > >
> > > I modified some of the code of hadoop-1.1.0 for my research project and
> > saw
> > > the following error. It could be some bug in my code, but I suspect it
> > is a
> > > combined effect of this closeAllForUGI function and perhaps a race
> > > condition in the DFSClient$LeaseChecker.close().
> > >
> > > Could you help me understand why it is safe to call this
> > > FileSystem.closeAllForUGI function at the end of the JobInProgress