Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Storing millions of small files


Copy link to this message
-
Re: Storing millions of small files
Mongo has the best out of box experience of anything, but can be limited in
terms of how far it will scale.

Hbase is a bit tricky to manage if you don't have expertise in managing
Hadoop.

Neither is a great idea if your data objects can be as large as 10MB.

On Wed, May 23, 2012 at 8:30 AM, Brendan cheng <[EMAIL PROTECTED]> wrote:

>
> Thanks you guys advice! I have to mention more for my use case:
> (1) million files to store(2) 99% static, no change once written(3) fast
> download, or highly Available (4) cost effective
> (5) in future, would like extend a versioning system on the file
> of course from administrative point of view, most Hadoop function works
> for me.
> I checked a little bit of HBASE and I want to compare it with MongoDB as
> both also kind of key value.  but MongoDB give me more functionalities that
> I don't need it at the moment.
> what do you think?
>
> ________________________________
> > Date: Tue, 22 May 2012 21:56:31 -0700
> > Subject: Re: Storing millions of small files
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]
> >
> > Brendan, since you are looking for a distr file system that can store
> > multi millions of files, try out MapR.  A few customers have actually
> > crossed over 1 trillion files without hitting problems.  Small files or
> > large files are handled equally well.
> >
> > Of course, if you are doing map-reduce, it is better to process more
> > data per mapper (I'd say the sweet spot is between 64M - 256M of data),
> > so it might make sense to process many small files per mapper.
> >
> > On Tue, May 22, 2012 at 2:39 AM, Brendan cheng
> > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> >
> > Hi,
> > I read HDFS architecture doc and it said HDFS is tuned for at storing
> > large file, typically gigabyte to terabytes.What is the downsize of
> > storing million of small files like <10MB?  or what setting of HDFS is
> > suitable for storing small files?
> > Actually, I plan to find a distribute filed system for storing mult
> > million of files.
> > Brendan
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB