|
|
-
Cleanup of distcache after running map reduce jobs
Mike Hugo 2013-03-07, 17:37
We noticed that after running several thousand map reduce jobs that our file system was filling up. The culprit is the libjars that are getting uploaded to the distributed cache for each job - doesn't look like they're ever being deleted.
Is there a mechanism to clear the distributed cache (or should this happen automatically).
This is probably a straight up hadoop question, but I'm asking here first in case you've seen this sort of thing with accumulo before.
Thanks!
Mike
-
Re: Cleanup of distcache after running map reduce jobs
John Vines 2013-03-07, 18:26
The cache will clear itself after 24 hours if I remember correctly. I have hit this issue before and, provided your hitting the same issue I've seen before, you're options are to either- 1. up the number of inodes for your system 2. add accumulo to the child opts classpath via mapred-site.xml and then use the normal hadoop command to kick off your job instead of the accumulo/tool.sh script On Thu, Mar 7, 2013 at 12:37 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
> We noticed that after running several thousand map reduce jobs that our > file system was filling up. The culprit is the libjars that are getting > uploaded to the distributed cache for each job - doesn't look like they're > ever being deleted. > > Is there a mechanism to clear the distributed cache (or should this happen > automatically). > > This is probably a straight up hadoop question, but I'm asking here first > in case you've seen this sort of thing with accumulo before. > > Thanks! > > Mike >
-
Re: Cleanup of distcache after running map reduce jobs
Mike Hugo 2013-03-07, 20:18
Thanks John!
I ended up playing with some settings in mapred-site.xml, namely mapreduce.tasktracker.local.cache.numberdirectories and local.cache.size and that seems to have resolved our issue for the moment. Mike On Thu, Mar 7, 2013 at 12:26 PM, John Vines <[EMAIL PROTECTED]> wrote:
> The cache will clear itself after 24 hours if I remember correctly. I have > hit this issue before and, provided your hitting the same issue I've seen > before, you're options are to either- > 1. up the number of inodes for your system > 2. add accumulo to the child opts classpath via mapred-site.xml and then > use the normal hadoop command to kick off your job instead of the > accumulo/tool.sh script > > > On Thu, Mar 7, 2013 at 12:37 PM, Mike Hugo <[EMAIL PROTECTED]> wrote: > >> We noticed that after running several thousand map reduce jobs that our >> file system was filling up. The culprit is the libjars that are getting >> uploaded to the distributed cache for each job - doesn't look like they're >> ever being deleted. >> >> Is there a mechanism to clear the distributed cache (or should this >> happen automatically). >> >> This is probably a straight up hadoop question, but I'm asking here first >> in case you've seen this sort of thing with accumulo before. >> >> Thanks! >> >> Mike >> > >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext