Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Maximum number of files in directory? (in hdfs)


+
Stuart Smith 2010-08-18, 00:44
Copy link to this message
-
Re: Maximum number of files in directory? (in hdfs)

On Aug 17, 2010, at 5:44 PM, Stuart Smith wrote:
> I started to break the files into subdirectories out of habit (from working on ntfs/etc), but it occurred to me that maybe (from a performance perspective), it doesn't really matter on hdfs.
>
> Does it? Is there some recommended limit on the number of files to store in one directory on hdfs? I'm thinking thousands to millions, so we're not talking about INT_MAX or anything, but a lot.
>
> Or is it only limited by my sanity :) ?

We have a directory with several thousand files in it.

It is always a pain when we hit it because the client heap size needs to be increased to do anything in it:  directory listings, web uis, distcp, etc, etc, etc.  Doing any sort of manipulation in that dir is also slower.

My recommendation: don't do it.  Directories, AFAIK, are relatively cheap resource wise vs. lots of files in one.

[Hopefully these files are large.  Otherwise they should be joined together... if not, you're going to take a performance hit processing them *and* storing them...]
+
stu24mail@... 2010-08-18, 02:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB