|
|
+
ccxixicc 2011-05-30, 02:44
-
Re: How to create a lot files in HDFS quickly?Ted Dunning 2011-05-30, 03:52
First, it is virtually impossible to create 100 million files in HDFS
because the name node can't hold that many. Secondly, file creation is bottle-necked by the name node so the files that you can create can't be created at more than about 1000 per second (and achieving more than half that rate is somewhat difficult). Thirdly, you need to check your cluster size because each data node can only store a limited number of blocks (exactly how many differs from version to version of Hadoop). For small clusters this is a more exigent limit than the size limit of the name node. Why is it that you need to do this? Perhaps there is a work-around? Consider for instance HAR files: http://www.cloudera.com/blog/2009/02/the-small-files-problem/ 2011/5/29 ccxixicc <[EMAIL PROTECTED]> > Hi all > I'm doing a test and need create lots of files ( 100 million ) in HDFS��� I > use a shell script to do this , it's very very slow, how to create a lot > files in HDFS quickly? > Thanks > +
Konstantin Boudnik 2011-05-30, 04:54
+
Ian Holsman 2011-05-30, 15:50
|