First, it is virtually impossible to create 100 million files in HDFS
because the name node can't hold that many.
Secondly, file creation is bottle-necked by the name node so the files that
you can create can't be created at more than about 1000 per second (and
achieving more than half that rate is somewhat difficult).
Thirdly, you need to check your cluster size because each data node can only
store a limited number of blocks (exactly how many differs from version to
version of Hadoop). For small clusters this is a more exigent limit than
the size limit of the name node.
Why is it that you need to do this?
Perhaps there is a work-around? Consider for instance HAR files:
2011/5/29 ccxixicc <[EMAIL PROTECTED]>
> Hi all
> I'm doing a test and need create lots of files ( 100 million ) in HDFS��� I
> use a shell script to do this , it's very very slow, how to create a lot
> files in HDFS quickly?