HDFS blocks are stored as files in the underlying filesystem of your
datanodes. Those files do not take a fixed amount of space, so if you
store 10 MB in a file and you have 128 MB blocks, you still only use
10 MB (times 3 with default replication).
However, the namenode does incur additional overhead by having to
track a larger number of small files. So, if you can merge files, it's
best practice to do so.
On Tue, Sep 20, 2011 at 9:54 PM, hao.wang <[EMAIL PROTECTED]> wrote:
> Hi All:
> I have lots of small files stored in HDFS. My HDFS block size is 128M. Each file is significantly smaller than the HDFS block size. Then, I want to know whether the small file used 128M in HDFS?