Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - scanner lease expired/region server shutdown


Copy link to this message
-
RE: scanner lease expired/region server shutdown
Kannan Muthukkaruppan 2010-01-26, 20:56
Some quick answers to the questions. Will follow up later today with more details.

1) the test nodes were 8 cpu machines.

2) region server log does report ulimit of 64k.

Mon Jan 25 17:41:22 PST 2010 Starting regionserver on titantest013.ash1.facebook.com
ulimit -n 65535
2010-01-25 17:41:23,605 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: vmInputArguments=[-Xmx12000m, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.log.dir=/usr/local/hadoop/logs/HBASE-kannan1, -Dhbase.log.file=hbase-hadoop-regionserver-tit\
antest013.ash1.facebook.com.log, -Dhbase.home.dir=/usr/local/hadoop/HBASE-kannan1/bin/.., -Dhbase.id.str=hadoop, -Dhbase.root.logger=INFO,DRFA, -Djava.library.path=/usr/local/hadoop/HBASE-kannan1/bin/../lib/native/Linux-amd64-64]

3) An aside. Still using -XX:+CMSIncrementalMode. But I recall reading that for machines with more than 2 cores, the recommendation is to not use the incremental mode.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack
Sent: Tuesday, January 26, 2010 8:32 AM
To: [EMAIL PROTECTED]
Subject: Re: scanner lease expired/region server shutdown

This is suspicious.  Make sure that second line in master or
regionserver log where it prints out ulimit for the process has 64k
(and not 1024).  You may not have set it for the DN user or setting it
on your linux may be ornery (See
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6 for note about
red hat based systems have upper bound for system, or here, where an
interesting issue around kernel limits:
http://wiki.apache.org/hadoop/Hbase/FAQ#A6).
St.Ack

On Tue, Jan 26, 2010 at 7:10 AM, Kannan Muthukkaruppan
<[EMAIL PROTECTED]> wrote:
>
> Looking further up in the logs (about 20 minutes prior in the logs when errors first started happening), I noticed the following.
>
> btw, ulimit -a shows that I have "open files" set to 64k. Is that not sufficient?
>
> 2010-01-25 11:10:21,774 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Data\
> XceiveServer: java.io.IOException: Too many open files
>        at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
>        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
>        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2010-01-25 11:10:21,566 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010, storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=50020):Got \
> exception while serving blk_3332344970774019423_10249 to /10.129.68.212:
> java.io.FileNotFoundException: /mnt/d1/HDFS-kannan1/current/subdir23/blk_3332344970774019423_10249.meta (Too many open files)
>        at java.io.FileInputStream.open(Native Method)
>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getMetaDataInputStream(FSDataset.java:682)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:97)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
> ________________________________________
> From: Kannan Muthukkaruppan [[EMAIL PROTECTED]]
> Sent: Tuesday, January 26, 2010 7:01 AM
> To: [EMAIL PROTECTED]
> Subject: RE: scanner lease expired/region server shutdown
>
> 1. Yes, it is a 5 node, setup.
>
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated this out. The nodes are 16GB, 4 disk machines.