|
|
-
Maximum number of files in directory? (in hdfs)
Stuart Smith 2010-08-18, 00:44
Hello, I'm looking at storing a large number of files under one directory.
I started to break the files into subdirectories out of habit (from working on ntfs/etc), but it occurred to me that maybe (from a performance perspective), it doesn't really matter on hdfs.
Does it? Is there some recommended limit on the number of files to store in one directory on hdfs? I'm thinking thousands to millions, so we're not talking about INT_MAX or anything, but a lot.
Or is it only limited by my sanity :) ?
I suppose it would come down to the data structure(s) used by the namenode when tracking file metadata. But I don't know what those are - I did skim the HDFS architecture document, but didn't see anything conclusive.
Take care, -stu
+
Stuart Smith 2010-08-18, 00:44
-
Re: Maximum number of files in directory? (in hdfs)
Allen Wittenauer 2010-08-18, 01:00
On Aug 17, 2010, at 5:44 PM, Stuart Smith wrote: > I started to break the files into subdirectories out of habit (from working on ntfs/etc), but it occurred to me that maybe (from a performance perspective), it doesn't really matter on hdfs. > > Does it? Is there some recommended limit on the number of files to store in one directory on hdfs? I'm thinking thousands to millions, so we're not talking about INT_MAX or anything, but a lot. > > Or is it only limited by my sanity :) ?
We have a directory with several thousand files in it.
It is always a pain when we hit it because the client heap size needs to be increased to do anything in it: directory listings, web uis, distcp, etc, etc, etc. Doing any sort of manipulation in that dir is also slower.
My recommendation: don't do it. Directories, AFAIK, are relatively cheap resource wise vs. lots of files in one.
[Hopefully these files are large. Otherwise they should be joined together... if not, you're going to take a performance hit processing them *and* storing them...]
+
Allen Wittenauer 2010-08-18, 01:00
-
Re: Maximum number of files in directory? (in hdfs)
stu24mail@... 2010-08-18, 02:02
Thanks! I'll go with keeping my sanity then.
The files will all be >= 64MB
Take care, -stu -----Original Message----- From: Allen Wittenauer <[EMAIL PROTECTED]> Date: Wed, 18 Aug 2010 01:00:42 To: <[EMAIL PROTECTED]><[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: Maximum number of files in directory? (in hdfs) On Aug 17, 2010, at 5:44 PM, Stuart Smith wrote: > I started to break the files into subdirectories out of habit (from working on ntfs/etc), but it occurred to me that maybe (from a performance perspective), it doesn't really matter on hdfs. > > Does it? Is there some recommended limit on the number of files to store in one directory on hdfs? I'm thinking thousands to millions, so we're not talking about INT_MAX or anything, but a lot. > > Or is it only limited by my sanity :) ?
We have a directory with several thousand files in it.
It is always a pain when we hit it because the client heap size needs to be increased to do anything in it: directory listings, web uis, distcp, etc, etc, etc. Doing any sort of manipulation in that dir is also slower.
My recommendation: don't do it. Directories, AFAIK, are relatively cheap resource wise vs. lots of files in one.
[Hopefully these files are large. Otherwise they should be joined together... if not, you're going to take a performance hit processing them *and* storing them...]
+
stu24mail@... 2010-08-18, 02:02
|
|