Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Storing millions of small files


Copy link to this message
-
Re: Storing millions of small files
Ted Dunning 2012-05-23, 10:12
Mongo has the best out of box experience of anything, but can be limited in
terms of how far it will scale.

Hbase is a bit tricky to manage if you don't have expertise in managing
Hadoop.

Neither is a great idea if your data objects can be as large as 10MB.

On Wed, May 23, 2012 at 8:30 AM, Brendan cheng <[EMAIL PROTECTED]> wrote:

>
> Thanks you guys advice! I have to mention more for my use case:
> (1) million files to store(2) 99% static, no change once written(3) fast
> download, or highly Available (4) cost effective
> (5) in future, would like extend a versioning system on the file
> of course from administrative point of view, most Hadoop function works
> for me.
> I checked a little bit of HBASE and I want to compare it with MongoDB as
> both also kind of key value.  but MongoDB give me more functionalities that
> I don't need it at the moment.
> what do you think?
>
> ________________________________
> > Date: Tue, 22 May 2012 21:56:31 -0700
> > Subject: Re: Storing millions of small files
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]
> >
> > Brendan, since you are looking for a distr file system that can store
> > multi millions of files, try out MapR.  A few customers have actually
> > crossed over 1 trillion files without hitting problems.  Small files or
> > large files are handled equally well.
> >
> > Of course, if you are doing map-reduce, it is better to process more
> > data per mapper (I'd say the sweet spot is between 64M - 256M of data),
> > so it might make sense to process many small files per mapper.
> >
> > On Tue, May 22, 2012 at 2:39 AM, Brendan cheng
> > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
> >
> > Hi,
> > I read HDFS architecture doc and it said HDFS is tuned for at storing
> > large file, typically gigabyte to terabytes.What is the downsize of
> > storing million of small files like <10MB?  or what setting of HDFS is
> > suitable for storing small files?
> > Actually, I plan to find a distribute filed system for storing mult
> > million of files.
> > Brendan
> >
>