Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - decommissioning nodes help


Copy link to this message
-
RE: decommissioning nodes help
Arun Ramakrishnan 2010-07-13, 21:46
I don't know where the problem was. J-D said somewhere that decommissioning process is well tested and less likely to have bugs.

Anyways, I just resorted to killing 2 nodes. Wait till fsck reports 100% replication to 3. Kill 2 more nodes ... and so on.
Worked fine.

Thanks
Arun

-----Original Message-----
From: Varene Olivier [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 13, 2010 1:32 AM
To: [EMAIL PROTECTED]
Subject: Re: decommissioning nodes help

Are your datanodes double attached to the network ?
If this is the case, you can indeed see your datanodes as double entries.
You should also check the match between your DNS resolution and the
hostname of your datanodes.
To solve your issue, you can switch off one data node at a time (by
killing) the process.
The master should see that and perform action to maintain the
replication level.
Do it slowly :) (or you might loose some data)
You can have an idea if the process is over or not if the io on block
writing is over

Cheers
Arun Ramakrishnan a écrit :
> That's what I thought.
>
> But,this was what I see in -report for the excluded nodes.
>
> **************
> ecommission Status : Normal
> Configured Capacity: 0 (0 KB)
> DFS Used: 0 (0 KB)
> Non DFS Used: 0 (0 KB)
> DFS Remaining: 0(0 KB)
> DFS Used%: 100%
> DFS Remaining%: 0%
> Last contact: Wed Dec 31 16:00:00 PST 1969
> ***************
>
> In the UI, the excluded nodes show up in both live and dead nodes. And its been several hours now. The block counts across the nodes is exactly the same.
> The cluster is not accessed by any clients, its not busy at all.
>
> And I have set dfs.balance.bandwidthPerSec = 2000000 in hdfs-site.xml
>
> Anyway, I think I am lost here. Am just resorting to killing 2 nodes at a time sorta backwardish strategy. At least I know it works.
>
> Thanks
> Arun
>
> -----Original Message-----
> From: Varene Olivier [mailto:[EMAIL PROTECTED]]
> Sent: Friday, July 09, 2010 7:44 AM
> To: [EMAIL PROTECTED]
> Subject: Re: decommissioning nodes help
>
> Hello,
>
> you should see in the Web interface
>
> http://yourDatanodeMaster:50070/
> the status of your node to Decommissioning
> when done, it is removed from the list of active nodes
>
> With a huge bandwith to perform the sync, the process is very fast
> so, to answer your other mail, process might be done
>
> you can also this the status of your node via CLI
>
> # hadoop dfsadmin -report
>
> Name : ...
> Decommission Status : <StatusOfYourNode>
> ...
>
>
> Hope it helps
>
>
>
> Arun Ramakrishnan a écrit :
>> Hi guys
>>
>>  I am a stuck in my attempt to remove nodes from hdfs.
>>
>> I followed the steps in https://issues.apache.org/jira/browse/HDFS-1125
>>
>> a)     add node to dfs.hosts.exclude
>>
>> b)      dfsadmin -refreshNodes
>>
>> c)      wait for decom to finish
>>
>> d)     remove node from both dfs.hosts and dfs.hosts.exclude
>>
>>  
>>
>> But after step a) and b) how do I know if decommission is complete.
>>
>> I am in the process of decommissioning 6 nodes and don't want to loose
>> any blocks ( rep factor is 3 ) with a restart.
>>
>>  
>>
>> I also opened https://issues.apache.org/jira/browse/HDFS-1290 if anyone
>> is interested.
>>
>>  
>>
>> Thanks
>>
>> Arun
>>
>>  
>>