Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Inserting many small files into HBase


Copy link to this message
-
Inserting many small files into HBase
Hi,

I'm planning to crawl thousands of news rss feeds via MapReduce, and save
each news article into HBase directly.

My concern is that Hadoop does not work well with a large number of
small-size files,

and if I insert every single news article (which is small-size apparently)
into HBase, (without separately storing it into HDFS)

I might end up with millions of files that are only several kilobytes in
size.

Or does HBase somehow automatically append each news article into a single
file, so that it would have only a few files of large-size?

Ed
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB