Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Lots of Different Kind of Datanode Errors


+
jeff whiting 2010-06-04, 15:56
+
Todd Lipcon 2010-06-04, 16:01
+
Jeff Whiting 2010-06-04, 16:37
+
Todd Lipcon 2010-06-04, 19:03
+
Alex Kozlov 2010-06-04, 19:18
+
Allen Wittenauer 2010-06-04, 19:19
+
Jeff Whiting 2010-06-07, 17:02
Copy link to this message
-
Re: Lots of Different Kind of Datanode Errors
Current synchronization on FSDataset seems not quite right. Doing what amounted to applying Todd's patch that modifies FSDataSet to use reentrant rwlocks cleared up that type of problem for us. 
  - Andy
From: Jeff Whiting <[EMAIL PROTECTED]>
Subject: Re: Lots of Different Kind of Datanode Errors
To: [EMAIL PROTECTED]
Date: Monday, June 7, 2010, 10:02 AM
  

 
Thanks for the replies.  I have turned off swap on all the machines to
prevent any swap problems.  I was pounding my hard drives quite hard. 
I had a simulated 60 clients loading data as fast as I could into hbase
with a map reduce export job going at the same time.  Would that
scenario explain some of the errors I was seeing?

Over the weekend under more of a normal load I haven't not any
exception except for about 6 of these:

2010-06-05 03:46:41,229 ERROR datanode.DataNode
(DataXceiver.java:run(131)) - DatanodeRegistration(192.168.0.98:50010,
storageID=DS-1806250311-192.168.0.98-50010-1274208294562,
infoPort=50075, ipcPort=50020):DataXceiver

org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException:
Block blk_-1677111232590888964_4471547 is valid, and cannot be written
to.

    at
org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:999)

The reason the config shows 4096 is because I increased the xceiver
account after the first email message in this thread.

~Jeff

Allen Wittenauer wrote:

  On Jun 4, 2010, at 12:03 PM, Todd Lipcon wrote:

  
  
    Hi Jeff,

That seems like a reasonable config, but the error message you pasted indicated xceivers was set to 2048 instead of 4096.

Also, in my experience SocketTimeoutExceptions are usually due to swapping. Verify that your machines aren't swapping when you're under load.
    
  
  Or doing any other heavy disk IO.

  

--
Jeff Whiting
Qualtrics Senior Software Engineer
[EMAIL PROTECTED]
 
+
Gokulakannan M 2010-06-08, 05:31
+
Andrew Purtell 2010-06-08, 16:54