We've recently run into jobtracker memory issues on our new hadoop cluster. A heap dump shows that there are thousands of copies of DistributedFileSystem kept in FileSystem$Cache, a bit over one for each job run on the cluster and their jobconf objects support this view. I believe these are created when the .staging directories get cleaned up but I may be wrong on that.
>From what I can tell in the dump, the username (probably not ugi, hard to tell), scheme and authority parts of the Cache$Key are the same across multiple objects in FileSystem$Cache. I can only assume that the usergroupinformation piece differs somehow every time it's created.
We're using CDH4.2, MR1, CentOS 6.3 and Java 1.6_31. Kerberos, ldap and so on are not enabled.
Is there any known reason for this type of behavior?
Marcin Mejran 2013-04-17, 15:14
agile.java@...) 2013-04-27, 10:40