Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Datanode error

Copy link to this message
RE: Datanode error
I am sorry, but I received an error when I sent the message to the list and all responses were
sent to my junk mail. So I tried to send it again, and just then noticed your emails.

>Please do also share if you're seeing an issue that you think is
>related to these log messages.

My datanodes do not have any big problem, but my regionservers are getting shutdown by
timeout and I think it is related to the datanodes. I already tried a lot of different configurations
but they keep "crashing". I asked in the hbase list, but we could not find anything (RSs seem
healthy). We have 10 RSs and they get shutdown 7 times per day.

So I thought maybe you guys could find what is wrong with my system.

Thanks again,
-----Original Message-----
From: Raj Vishwanathan [mailto:[EMAIL PROTECTED]]
Sent: sexta-feira, 20 de julho de 2012 14:38
Subject: Re: Datanode error

Could also be due to network issues. Number of sockets could be less or number of threads could be less.


> From: Harsh J <[EMAIL PROTECTED]>
>Sent: Friday, July 20, 2012 9:06 AM
>Subject: Re: Datanode error
>These all seem to be timeouts from clients when they wish to read a
>block and drops from clients when they try to write a block. I wouldn't
>think of them as critical errors. Aside of being worried that a DN is
>logging these, are you noticing any usability issue in your cluster? If
>not, I'd simply blame this on stuff like speculative tasks, region
>servers, general HDFS client misbehavior, etc.
>Please do also share if you're seeing an issue that you think is
>related to these log messages.
>On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa <[EMAIL PROTECTED]> wrote:
>> Hey guys,
>> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
>> However my datanodes keep having the same errors, over and over.
>> I googled the problems and tried different flags (ex:
>> -XX:MaxDirectMemorySize=2G) and different configs (xceivers=8192) but could not solve it.
>> Does anyone know what is the problem and how can I solve it? (the
>> stacktrace is at the end)
>> I am running:
>> Java 1.7
>> Hadoop 0.20.2
>> Hbase 0.90.6
>> Zoo 3.3.5
>> % top -> shows low load average (6% most of the time up to 60%),
>> already considering the number of cpus % vmstat -> shows no swap at
>> all % sar -> shows 75% idle cpu in the worst case
>> Hope you guys can help me.
>> Thanks in advance,
>> Pablo
>> 2012-07-20 00:03:44,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /DN01:50010, dest:
>> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, srvID: DS-798921853-DN01-50010-1328651609047, blockid: blk_914960691839012728_14061688, duration:
>> 480061254006
>> 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(DN01:50010, storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to /DN01:
>> java.net.SocketTimeoutException: 480000 millis timeout while waiting
>>for channel to be ready for write. ch :
>>java.nio.channels.SocketChannel[connected local=/DN01:50010
>>         at
>>         at
>>         at
>>         at
>>         at
>>         at
>>         at