Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Auto clean DistCache?


Copy link to this message
-
Re: Auto clean DistCache?
Let me clarify , If there are lots of files or directories up to 32K (
Depending on the user's # of files sys os config) in
those distributed cache dirs, The OS will not be able to create any more
files/dirs, Thus M-R jobs wont get initiated on those tasktracker machines.
Hope this helps.
Thanks
On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli <
[EMAIL PROTECTED]> wrote:

>
> All the files are not opened at the same time ever, so you shouldn't see
> any "# of open files exceeds error".
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote:
>
> Hi JM ,
>
> Actually these dirs need to be purged by a script that keeps the last 2
> days worth of files, Otherwise you may run into # of open files exceeds
> error.
>
> Thanks
>
>
> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> wrote:
>
> Hi,
>
>
> Each time my MR job is run, a directory is created on the TaskTracker
>
> under mapred/local/taskTracker/hadoop/distcache (based on my
>
> configuration).
>
>
> I looked at the directory today, and it's hosting thousands of
>
> directories and more than 8GB of data there.
>
>
> Is there a way to automatically delete this directory when the job is done?
>
>
> Thanks,
>
>
> JM
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB