-Re: datanode timeout
shashwat shriparv 2012-06-27, 20:08
Hey increase the number of open file setting...
On Mon, Jun 25, 2012 at 10:21 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Mon, Jun 25, 2012 at 9:00 AM, Frédéric Fondement
> <[EMAIL PROTECTED]> wrote:
> > 2012-06-25 10:25:30,646 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(10.120.0.5:50010,
> > storageID=DS-1339564791-127.0.0.1-50010-1296151113818, infoPort=50075,
> > ipcPort=50020):DataXceiver
> > java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> > channel to be ready for write. ch :
> > java.nio.channels.SocketChannel[connected local=/10.120.0.5:50010
> > You might have guessed that local machine is 10.120.0.5. Unsuprisingly,
> > process on port 50010 is the datanode. Port 42564 is changing depending
> > the error instance, and seems to correspond to the regionserver process.
> > I ask for processes connected to port 50010 using an 'lsof -i :50010', I
> > have an impressive number of sockets (#400). Is it normal ?
> Stuff is working? I don't think these exceptions above an issue.
> HBase opens all its files to hdfs on startup.
> The above is a timeout on the file because there has been no activity
> in 8 minutes. Thats what HDFS does server-side.
> When the dfs client goes to read on this socket that has been closed
> later, the connection will be put back up w/o complaint.
> On the 400 files, hdfs keeps a running thread or so per open file in
> the datanode (Your lsof shows this?).
> > I need to add that current load (requests, IOs, CPU, ...) is rather slow.
> You mean 'low' or slow?
> > I can't find any other error in namenode or regionserver logs.
> > All the best,
> > Frédéric.