|
|
-
what happens when a datanode rejoins?
Mehul Choube 2012-09-11, 07:14
Hi,
What happens when an existing (not new) datanode rejoins a cluster for following scenarios: 1. Some of the blocks it was managing are deleted/modified?
2. The size of the blocks are now modified say from 64MB to 128MB?
3. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins?
Thanks, Mehul
-
Re: what happens when a datanode rejoins?
George Datskos 2012-09-11, 07:25
Hi Mehul
> Some of the blocks it was managing are deleted/modified? >
The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes.
> The size of the blocks are now modified say from 64MB to 128MB? >
Block size is a per-file setting so new files will be 128MB, but the old ones will remain at 64MB.
> What if the block replication factor was one (yea not in most > deployments but say incase) so does the namenode recreate a file once > the datanode rejoins? >
(assuming you didn't perform a decommission) Blocks that lived only on that datanode will be declared "missing" and the files associated with those blocks will be not be able to be fully read, until the datanode rejoins.
George
-
Re: what happens when a datanode rejoins?
George Datskos 2012-09-11, 07:32
Mehul,
Let me make an addition.
> Some of the blocks it was managing are deleted/modified?
Blocks that are deleted in the interim will deleted on the rejoining node as well, after it rejoins . Regarding the "modified," I'd advise against modifying blocks after they have been fully written. George
-
Re: what happens when a datanode rejoins?
Harsh J 2012-09-11, 08:03
George has answered most of these. I'll just add on:
On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube <[EMAIL PROTECTED]> wrote: > 1. Some of the blocks it was managing are deleted/modified?
A DN runs a block report upon start, and sends the list of blocks to the NN. NN validates them and if it finds any files to miss block replicas post-report, it will schedule a re-replication from one of the good DNs that still carry it. The modified (out-of-HDFS) blocks fail their stored checksums so are treated as corrupt and deleted, and are re-replicated in the same manner.
> 2. The size of the blocks are now modified say from 64MB to 128MB?
George's got this already. Changing of block size does not impact any existing blocks. It is a per-file metadata prop.
> 3. What if the block replication factor was one (yea not in most > deployments but say incase) so does the namenode recreate a file once the > datanode rejoins?
Files exist at the NN metadata (its fsimage/edits persist this). Blocks pertaining to a file exists at a DN. If the file had a single replica and that replica was lost, then the file's data is lost and the NameNode will tell you as much in its metrics/fsck.
-- Harsh J
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext