Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> How to create a lot files in HDFS quickly?


Copy link to this message
-
Re: How to create a lot files in HDFS quickly?
First, it is virtually impossible to create 100 million files in HDFS
because the name node can't hold that many.

Secondly, file creation is bottle-necked by the name node so the files that
you can create can't be created at more than about 1000 per second (and
achieving more than half that rate is somewhat difficult).

Thirdly, you need to check your cluster size because each data node can only
store a limited number of blocks (exactly how many differs from version to
version of Hadoop).  For small clusters this is a more exigent limit than
the size limit of the name node.

Why is it that you need to do this?

Perhaps there is a work-around?  Consider for instance HAR files:

http://www.cloudera.com/blog/2009/02/the-small-files-problem/
2011/5/29 ccxixicc <[EMAIL PROTECTED]>

> Hi all
> I'm doing a test and need create lots of files ( 100 million ) in HDFS��� I
> use a shell script to do this , it's very very slow, how to create a lot
> files in HDFS quickly?
> Thanks
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB