Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> how to improve the Hadoop's capability of  dealing with small files


+
陈桂芬 2009-05-07, 02:33
+
imcaptor 2009-05-07, 02:39
+
Jonathan Cao 2009-05-07, 03:01
+
Piotr Praczyk 2009-05-07, 07:27
Copy link to this message
-
Re: how to improve the Hadoop's capability of dealing with small files
Hey,

You can read more about why small files are difficult for HDFS at
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.

Regards,
Jeff

2009/5/7 Piotr Praczyk <[EMAIL PROTECTED]>

> If You want to use many small files, they are probably having the same
> purpose and struc?
> Why not use HBase instead of a raw HDFS ? Many small files would be packed
> together and the problem would disappear.
>
> cheers
> Piotr
>
> 2009/5/7 Jonathan Cao <[EMAIL PROTECTED]>
>
> > There are at least two design choices in Hadoop that have implications
> for
> > your scenario.
> > 1. All the HDFS meta data is stored in name node memory -- the memory
> size
> > is one limitation on how many "small" files you can have
> >
> > 2. The efficiency of map/reduce paradigm dictates that each
> mapper/reducer
> > job has enough work to offset the overhead of spawning the job.  It
> relies
> > on each task reading contiguous chuck of data (typically 64MB), your
> small
> > file situation will change those efficient sequential reads to larger
> > number
> > of inefficient random reads.
> >
> > Of course, small is a relative term?
> >
> > Jonathan
> >
> > 2009/5/6 陈桂芬 <[EMAIL PROTECTED]>
> >
> > > Hi:
> > >
> > > In my application, there are many small files. But the hadoop is
> designed
> > > to deal with many large files.
> > >
> > > I want to know why hadoop doesn’t support small files very well and
> where
> > > is the bottleneck. And what can I do to improve the Hadoop’s capability
> > of
> > > dealing with small files.
> > >
> > > Thanks.
> > >
> > >
> >
>
+
Edward Capriolo 2009-05-07, 14:52
+
jason hadoop 2009-05-07, 15:24
+
Rasit OZDAS 2009-05-13, 05:31