Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Inserting many small files into HBase

Copy link to this message
Re: Inserting many small files into HBase
Take a look at this:


then read the bigtable paper.

On Sun, Mar 20, 2011 at 6:39 PM, edward choi <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm planning to crawl thousands of news rss feeds via MapReduce, and save
> each news article into HBase directly.
> My concern is that Hadoop does not work well with a large number of
> small-size files,
> and if I insert every single news article (which is small-size apparently)
> into HBase, (without separately storing it into HDFS)
> I might end up with millions of files that are only several kilobytes in
> size.
> Or does HBase somehow automatically append each news article into a single
> file, so that it would have only a few files of large-size?
> Ed