1. Your first quess is right - file is 'broken' into blocks which are then
stored according to the replication policy and other things.
2. It doesn't happen automatically, as far as I know. One has to 're-balance'
the cluster in this case.
On 11/16/09 13:47 , Massoud Mazar wrote:
> This is probably a basic question:
> Assuming replication is set to 3, when we store a large file in HDFS, is
> the whole file stored in 3 nodes (even if you have many more nodes) or
> it is broken into blocks and each block is written to 3 nodes? (I assume
> it is the latter, so when you have 30 nodes available, each one gets a
> piece of the file, providing more performance when reading the file).
> My second question is what happens if we add more nodes to an existing
> cluster? Would any existing blocks be moved to these new nodes to expand
> the distribution of the data to new nodes?