|
|
-
Decommissioning a datanode takes forever
Ben Kim 2013-01-21, 02:52
Hi!
I followed the decommissioning guide on the hadoop hdfs wiki.
the hdfs web ui shows that the decommissioning proceess has successfully begun.
it started redeploying 80,000 blocks through the hadoop cluster, but for some reason it stopped at 9059 blocks. I've waited 30 hours and still no progress.
Any one with any idea? --
*Benjamin Kim* *benkimkimben at gmail*
-
Re: Decommissioning a datanode takes forever
varun kumar 2013-01-21, 06:05
Hi Ben,
Are there any corrupted blocks in your hadoop cluster.
Regards, Varun Kumar
On Mon, Jan 21, 2013 at 8:22 AM, Ben Kim <[EMAIL PROTECTED]> wrote:
> Hi! > > I followed the decommissioning guide on the hadoop hdfs wiki. > > the hdfs web ui shows that the decommissioning proceess has successfully > begun. > > it started redeploying 80,000 blocks through the hadoop cluster, but for > some reason it stopped at 9059 blocks. I've waited 30 hours and still no > progress. > > Any one with any idea? > -- > > *Benjamin Kim* > *benkimkimben at gmail* >
-- Regards, Varun Kumar.P
-
Re: Decommissioning a datanode takes forever
Ben Kim 2013-01-22, 00:28
Hi Varun, Thnk you for the reponse
No there doesnt seem to be any corrupted blocks in my cluster. I did "hadoop fsck -blocks /" and it didnt report any corrupted block.
However, these are two WARNings in the namenode log, constantly repeating since the decommission.
- 2013-01-22 09:16:30,908 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log, edits.new files already exists in all healthy directories: - 2013-01-22 09:12:10,885 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1
There isn't any WARN or ERROR in the decommissioning datanode log
Ben On Mon, Jan 21, 2013 at 3:05 PM, varun kumar <[EMAIL PROTECTED]> wrote:
> Hi Ben, > > Are there any corrupted blocks in your hadoop cluster. > > Regards, > Varun Kumar > > > On Mon, Jan 21, 2013 at 8:22 AM, Ben Kim <[EMAIL PROTECTED]> wrote: > >> Hi! >> >> I followed the decommissioning guide on the hadoop hdfs wiki. >> >> the hdfs web ui shows that the decommissioning proceess has successfully >> begun. >> >> it started redeploying 80,000 blocks through the hadoop cluster, but for >> some reason it stopped at 9059 blocks. I've waited 30 hours and still no >> progress. >> >> Any one with any idea? >> -- >> >> *Benjamin Kim* >> *benkimkimben at gmail* >> > > > > -- > Regards, > Varun Kumar.P >
--
*Benjamin Kim* *benkimkimben at gmail*
-
Re: Decommissioning a datanode takes forever
Ben Kim 2013-01-22, 08:38
UPDATE:
WARN with edit log had nothing to do with the current problem.
However replica placement warnings seem to be suspicious. Please have a look at the following logs.
2013-01-22 09:12:10,885 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2013-01-22 00:02:17,541 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Block: blk_4844131893883391179_3440513, Expected Replicas: 10, live replicas: 9, c orrupt replicas: 0, decommissioned replicas: 1, excess replicas: 0, Is Open File: false, Datanodes having this block: 203.235.211.155:50010 203.235.211.156:5001020 3.235.211.145:50010 203.235.211.144:50010 203.235.211.146:50010 203.235.211.158:50010 203.235.211.159:50010 203.235.211.157:50010 203.235.211.160:50010 203.235.211. 143:50010 , Current Datanode: 203.235.211.155:50010, Is current datanode decommissioning: true
I have set my replication factor to 3. I dont understand why hadoop is trying to replicate it to 10 nodes. I have decommissioned one node so currently I have 9 nodes in operation. It will never be replicated to 10 nodes.
I also see that all repeated warning msg like the above is for blk_4844131893883391179_3440513.
How would I delete the block? it's not showing as corrupted block on fsck. :(
BEN On Tue, Jan 22, 2013 at 9:28 AM, Ben Kim <[EMAIL PROTECTED]> wrote:
> Hi Varun, Thnk you for the reponse > > No there doesnt seem to be any corrupted blocks in my cluster. > I did "hadoop fsck -blocks /" and it didnt report any corrupted block. > > However, these are two WARNings in the namenode log, constantly repeating > since the decommission. > > - 2013-01-22 09:16:30,908 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log, > edits.new files already exists in all healthy directories: > - 2013-01-22 09:12:10,885 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place > enough replicas, still in need of 1 > > There isn't any WARN or ERROR in the decommissioning datanode log > > Ben > > > > On Mon, Jan 21, 2013 at 3:05 PM, varun kumar <[EMAIL PROTECTED]> wrote: > >> Hi Ben, >> >> Are there any corrupted blocks in your hadoop cluster. >> >> Regards, >> Varun Kumar >> >> >> On Mon, Jan 21, 2013 at 8:22 AM, Ben Kim <[EMAIL PROTECTED]> wrote: >> >>> Hi! >>> >>> I followed the decommissioning guide on the hadoop hdfs wiki. >>> >>> the hdfs web ui shows that the decommissioning proceess has successfully >>> begun. >>> >>> it started redeploying 80,000 blocks through the hadoop cluster, but for >>> some reason it stopped at 9059 blocks. I've waited 30 hours and still no >>> progress. >>> >>> Any one with any idea? >>> -- >>> >>> *Benjamin Kim* >>> *benkimkimben at gmail* >>> >> >> >> >> -- >> Regards, >> Varun Kumar.P >> > > > > -- > > *Benjamin Kim* > *benkimkimben at gmail* >
--
*Benjamin Kim* *benkimkimben at gmail*
-
Re: Decommissioning a datanode takes forever
Ben Kim 2013-01-22, 14:09
Impatient I am, I just shut down the cluster and restarted it with empty exclude file.
If I added the datanode hostname back to the exclude file, and ran hadoop dfsadmin -refreshNodes, *the datanode goes straight to the dead node *without going to the descommission process.
I'm done for today. maybe someone else can figure it out when I come back tomorrow :)
Best regards, Ben
On Tue, Jan 22, 2013 at 5:38 PM, Ben Kim <[EMAIL PROTECTED]> wrote:
> UPDATE: > > WARN with edit log had nothing to do with the current problem. > > However replica placement warnings seem to be suspicious. > Please have a look at the following logs. > > > 2013-01-22 09:12:10,885 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place > enough replicas, still in need of 1 > 2013-01-22 00:02:17,541 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > Block: blk_4844131893883391179_3440513, > Expected Replicas: 10, live replicas: 9, c orrupt replicas: 0, > decommissioned replicas: 1, excess replicas: 0, Is Open File: false, > Datanodes having this block: 203.235.211.155:50010 203.235.211.156:5001020 > 3.235.211.145:50010 203.235.211.144:50010 203.235.211.146:50010 > 203.235.211.158:50010 203.235.211.159:50010 203.235.211.157:50010 > 203.235.211.160:50010 203.235.211. 143:50010 , > Current Datanode: 203.235.211.155:50010, Is current datanode > decommissioning: true > > I have set my replication factor to 3. I dont understand why hadoop is > trying to replicate it to 10 nodes. I have decommissioned one node so > currently I have 9 nodes in operation. It will never be replicated to 10 > nodes. > > I also see that all repeated warning msg like the above is for > blk_4844131893883391179_3440513. > > How would I delete the block? it's not showing as corrupted block on fsck. > :( > > BEN > > > > > > On Tue, Jan 22, 2013 at 9:28 AM, Ben Kim <[EMAIL PROTECTED]> wrote: > >> Hi Varun, Thnk you for the reponse >> >> No there doesnt seem to be any corrupted blocks in my cluster. >> I did "hadoop fsck -blocks /" and it didnt report any corrupted block. >> >> However, these are two WARNings in the namenode log, constantly repeating >> since the decommission. >> >> - 2013-01-22 09:16:30,908 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log, >> edits.new files already exists in all healthy directories: >> - 2013-01-22 09:12:10,885 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place >> enough replicas, still in need of 1 >> >> There isn't any WARN or ERROR in the decommissioning datanode log >> >> Ben >> >> >> >> On Mon, Jan 21, 2013 at 3:05 PM, varun kumar <[EMAIL PROTECTED]> wrote: >> >>> Hi Ben, >>> >>> Are there any corrupted blocks in your hadoop cluster. >>> >>> Regards, >>> Varun Kumar >>> >>> >>> On Mon, Jan 21, 2013 at 8:22 AM, Ben Kim <[EMAIL PROTECTED]> wrote: >>> >>>> Hi! >>>> >>>> I followed the decommissioning guide on the hadoop hdfs wiki. >>>> >>>> the hdfs web ui shows that the decommissioning proceess has >>>> successfully begun. >>>> >>>> it started redeploying 80,000 blocks through the hadoop cluster, but >>>> for some reason it stopped at 9059 blocks. I've waited 30 hours and still >>>> no progress. >>>> >>>> Any one with any idea? >>>> -- >>>> >>>> *Benjamin Kim* >>>> *benkimkimben at gmail* >>>> >>> >>> >>> >>> -- >>> Regards, >>> Varun Kumar.P >>> >> >> >> >> -- >> >> *Benjamin Kim* >> *benkimkimben at gmail* >> > > > > -- > > *Benjamin Kim* > *benkimkimben at gmail* >
--
*Benjamin Kim* *benkimkimben at gmail*
|
|