Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> scanner lease expired/region server shutdown


Copy link to this message
-
RE: scanner lease expired/region server shutdown
Dhruba: yes, the "Too many open files" exception is getting reported by the DN process. The same node is also running an HBase region server.

And yes, I confirmed that the xcievers setting is 2048.

Regards,
Kannan
-----Original Message-----
From: Dhruba Borthakur [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 26, 2010 10:10 AM
To: [EMAIL PROTECTED]
Subject: Re: scanner lease expired/region server shutdown

This exception is from the DataNode, right? This means that the datanode
process has 32K files open simultaneously, how can that be? For each block
read/write the datanode has two open files, one for the data and one for the
.meta where the crc gets stored.

On the other hand, the datanode is configured via dfs.datanode.max.xcievers
to support 2048 read/write request simultanously, right?

thanks,
dhruba
On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<[EMAIL PROTECTED]>wrote:

>
> Looking further up in the logs (about 20 minutes prior in the logs when
> errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not
> sufficient?
>
> 2010-01-25 11:10:21,774 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at
> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException:
> /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta
> (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [[EMAIL PROTECTED]]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: [EMAIL PROTECTED]
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and
> the other three are running region servers. ZK is on all the same 5 nodes.
> Should ideally have separated this out. The nodes are 16GB, 4 disk machines.
>
> 2. I examined the HDFS datanode log on the same machine around that time
> the problems happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.129.68.212:50010,
> storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075,
> ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /
> 10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010remote=/
> 10.129.68.212:477\
> 29]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)

Connect to me at http://www.facebook.com/dhruba