Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Auto clean DistCache?


Copy link to this message
-
Re: Auto clean DistCache?
Hemanth Yamijala 2013-03-28, 05:09
I don't think it is documented in mapred-default.xml, where it should
ideally be. I could see it only in code. You can take a look at it here, if
you are interested: http://goo.gl/k5xsI

Thanks
Hemanth
On Wed, Mar 27, 2013 at 7:07 PM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Oh! good to know! It keep tracks even of month old entries??? There is no
> TTL?
>
> I was not able to find the documentation for  local.cache.size or
> mapreduce.tasktracker.cache.local.size  in 1.0.x branch. Do you know
> where I can found that?
>
> Thanks,
>
> JM
>
> 2013/3/27 Koji Noguchi <[EMAIL PROTECTED]>:
> >> Else, I will go for a customed script to delete all directories (and
> content) older than 2 or 3 days…
> >>
> > TaskTracker (or NodeManager in 2.*) keeps the list of dist cache entries
> in memory.
> > So if external process (like your script) start deleting dist cache
> files, there would be inconsistency and you'll start seeing task
> initialization failures due to no file found error.
> >
> > Koji
> >
> >
> > On Mar 26, 2013, at 9:00 PM, Jean-Marc Spaggiari wrote:
> >
> >> For the situation I faced I was really a disk space issue, not related
> >> to the number of files. It was writing on a small partition.
> >>
> >> I will try with local.cache.size or
> >> mapreduce.tasktracker.cache.local.size to see if I can keep the final
> >> total size under 5GB... Else, I will go for a customed script to
> >> delete all directories (and content) older than 2 or 3 days...
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >> 2013/3/26 Abdelrahman Shettia <[EMAIL PROTECTED]>:
> >>> Let me clarify , If there are lots of files or directories up to 32K (
> >>> Depending on the user's # of files sys os config) in those distributed
> cache
> >>> dirs, The OS will not be able to create any more files/dirs, Thus M-R
> jobs
> >>> wont get initiated on those tasktracker machines. Hope this helps.
> >>>
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli
> >>> <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>
> >>>> All the files are not opened at the same time ever, so you shouldn't
> see
> >>>> any "# of open files exceeds error".
> >>>>
> >>>> Thanks,
> >>>> +Vinod Kumar Vavilapalli
> >>>> Hortonworks Inc.
> >>>> http://hortonworks.com/
> >>>>
> >>>> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote:
> >>>>
> >>>> Hi JM ,
> >>>>
> >>>> Actually these dirs need to be purged by a script that keeps the last
> 2
> >>>> days worth of files, Otherwise you may run into # of open files
> exceeds
> >>>> error.
> >>>>
> >>>> Thanks
> >>>>
> >>>>
> >>>> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]>
> >>>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> Each time my MR job is run, a directory is created on the TaskTracker
> >>>>
> >>>> under mapred/local/taskTracker/hadoop/distcache (based on my
> >>>>
> >>>> configuration).
> >>>>
> >>>>
> >>>> I looked at the directory today, and it's hosting thousands of
> >>>>
> >>>> directories and more than 8GB of data there.
> >>>>
> >>>>
> >>>> Is there a way to automatically delete this directory when the job is
> >>>> done?
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>>
> >>>> JM
> >>>>
> >>>>
> >>>>
> >>>
> >
>