Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> how blocks are replicated


Copy link to this message
-
Re: how blocks are replicated
1. Your first quess is right - file is 'broken' into blocks which are then
stored according to the replication policy and other things.

2. It doesn't happen automatically, as far as I know. One has to 're-balance'
the cluster in this case.

--
Take care,
   Cos

On 11/16/09 13:47 , Massoud Mazar wrote:
> This is probably a basic question:
>
> Assuming replication is set to 3, when we store a large file in HDFS, is
> the whole file stored in 3 nodes (even if you have many more nodes) or
> it is broken into blocks and each block is written to 3 nodes? (I assume
> it is the latter, so when you have 30 nodes available, each one gets a
> piece of the file, providing more performance when reading the file).
>
> My second question is what happens if we add more nodes to an existing
> cluster? Would any existing blocks be moved to these new nodes to expand
> the distribution of the data to new nodes?
>
> Thanks
>
> Massoud
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB