Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Lease does not exist exceptions


Copy link to this message
-
Lease does not exist exceptions
Eran Kutner 2011-10-18, 10:28
Hi,
I'm having a problem when running map/reduce on a table with about 500
regions.
The MR job shows this kind of excpetions:
11/10/18 06:03:39 INFO mapred.JobClient: Task Id :
attempt_201110030100_0086_m_000062_0, Status : FAILED
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-334679770697295011' does not exist
        at
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1845)
        at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
        at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:83)
        at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:1)
        at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1019)
        at
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1151)
        at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:149)
        at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
        at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
        at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)

the hbase logs are full of these:
2011-10-18 06:07:01,425 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'3475143032285946374' does not exist
        at
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1845)
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
and the datanodes logs have a few (seem to be a lot less than the hbase
errors) of these:
2011-10-18 06:16:42,550 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.1.104.4:50010, storageID=DS-15546166-10.1.104.4-50010-1298985607414,
infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.1.104.4:50010 remote=/
10.1.104.1:57232]
        at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
        at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
        at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:214)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:114)

I've increased all the relevant limits I know of (which were high to begin
with), so now I have 64K file descriptors and dfs.datanode.max.xcievers is
8192 .
I've restarted everything in the cluster, to make sure all the processed
picked the new configurations, but I still get those errors. They always
begin when the map phase is around 12-14% and eventually the job fails at
~50%
Running random scans against the same  hbase table while the job is running
seems to work fine.

I'm using hadoop 0.20.2+923.97-1 from CDH3 and hbase 0.90.4 compiled from
the branch code a while ago.

Any other setting I'm missing or other ideas of what can be causing it?

Thanks.

-eran