Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Storing millions of small files

Copy link to this message
Re: Storing millions of small files
Mohammad Tariq 2012-05-22, 10:03
Hi Brendan,
      Every file, directory and block in HDFS is represented as an
object in the namenode’s memory, each of which occupies 150 bytes.When
we store many small files in the HDFS, these small files occupy a
large portion of the namespace(large overhead on namenode). As a
consequence, the disk space is underutilized because of the namespace
limitation.If you want to handle "small files", you should go for
"hadoop sequence file or HAR files" depending upon your use
case..Hbase is also an option.But again it depends upon your use
case.I would suggest you go through this blog -
"http://www.cloudera.com/blog/2009/02/the-small-files-problem/". Must
read for people managing large no of small files.

    Mohammad Tariq
On Tue, May 22, 2012 at 3:09 PM, Brendan cheng <[EMAIL PROTECTED]> wrote:
> Hi,
> I read HDFS architecture doc and it said HDFS is tuned for at storing large file, typically gigabyte to terabytes.What is the downsize of storing million of small files like <10MB?  or what setting of HDFS is suitable for storing small files?
> Actually, I plan to find a distribute filed system for storing mult million of files.
> Brendan