Every file, directory and block in HDFS is represented as an
object in the namenode’s memory, each of which occupies 150 bytes.When
we store many small files in the HDFS, these small files occupy a
large portion of the namespace(large overhead on namenode). As a
consequence, the disk space is underutilized because of the namespace
limitation.If you want to handle "small files", you should go for
"hadoop sequence file or HAR files" depending upon your use
case..Hbase is also an option.But again it depends upon your use
case.I would suggest you go through this blog -
read for people managing large no of small files.
On Tue, May 22, 2012 at 3:09 PM, Brendan cheng <[EMAIL PROTECTED]> wrote:
> I read HDFS architecture doc and it said HDFS is tuned for at storing large file, typically gigabyte to terabytes.What is the downsize of storing million of small files like <10MB? or what setting of HDFS is suitable for storing small files?
> Actually, I plan to find a distribute filed system for storing mult million of files.