|
|
-
NPE while obtaining row lock
Yves Langisch 2011-12-13, 07:52
Hi,
from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock:
--- 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2010) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:1996) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) 2011-12-13 02:00:20,529 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.56 MB of total=335.72 MB 2011-12-13 02:00:20,532 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.2 MB, single=177.89 MB, multi=154.62 MB, memory=42.16 KB 2011-12-13 02:00:21,433 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.49 MB of total=335.66 MB 2011-12-13 02:00:21,435 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.52 MB, total=296.14 MB, single=172.92 MB, multi=159.46 MB, memory=42.16 KB ---
Any ideas?
Regards Yves
+
Yves Langisch 2011-12-13, 07:52
-
Re: NPE while obtaining row lock
Yves Langisch 2011-12-13, 08:04
Forgot to mention the versions: I'm running HBase 0.90.4 with Hadoop 0.20.205.0.
On Dec 13, 2011, at 8:52 AM, Yves Langisch wrote:
> Hi, > > from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock: > > --- > 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB > 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) > java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) > at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2010) > at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:1996) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > 2011-12-13 02:00:20,529 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.56 MB of total=335.72 MB > 2011-12-13 02:00:20,532 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.2 MB, single=177.89 MB, multi=154.62 MB, memory=42.16 KB > 2011-12-13 02:00:21,433 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 39.49 MB of total=335.66 MB > 2011-12-13 02:00:21,435 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.52 MB, total=296.14 MB, single=172.92 MB, multi=159.46 MB, memory=42.16 KB > --- > > Any ideas? > > Regards > Yves
+
Yves Langisch 2011-12-13, 08:04
-
Re: NPE while obtaining row lock
Stack 2011-12-14, 06:10
On Mon, Dec 12, 2011 at 11:52 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: > Hi, > > from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock: > > --- > 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB > 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) > java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
Any chance you are passing a null lockid Yves?
See below:
2005 protected long addRowLock(Integer r, HRegion region) 2006 throws LeaseStillHeldException { 2007 long lockId = -1L; 2008 lockId = rand.nextLong(); 2009 String lockName = String.valueOf(lockId); 2010 rowlocks.put(lockName, r); 2011 this.leases.createLease(lockName, new RowLockListener(lockName, region)); 2012 return lockId; 2013 } St.Ack
+
Stack 2011-12-14, 06:10
-
Re: NPE while obtaining row lock
Yves Langisch 2011-12-16, 16:20
I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own:
<snippet> return database.client().lockRow( new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join();
</snippet>
Any ideas what else could be wrong here?
Yves
On Dec 14, 2011, at 7:10 AM, Stack wrote:
> On Mon, Dec 12, 2011 at 11:52 PM, Yves Langisch <[EMAIL PROTECTED]> wrote: >> Hi, >> >> from time to time I get a NPE on my regionserver. Apparently it's occurring while obtaining a row lock: >> >> --- >> 2011-12-13 02:00:19,582 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=39.59 MB, total=296.26 MB, single=164.82 MB, multi=167.62 MB, memory=42.16 KB >> 2011-12-13 02:00:20,365 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error obtaining row lock (fsOk: true) >> java.lang.NullPointerException >> at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) > > Any chance you are passing a null lockid Yves? > > See below: > > 2005 protected long addRowLock(Integer r, HRegion region) > 2006 throws LeaseStillHeldException { > 2007 long lockId = -1L; > 2008 lockId = rand.nextLong(); > 2009 String lockName = String.valueOf(lockId); > 2010 rowlocks.put(lockName, r); > 2011 this.leases.createLease(lockName, new > RowLockListener(lockName, region)); > 2012 return lockId; > 2013 } > > > St.Ack >
+
Yves Langisch 2011-12-16, 16:20
-
Re: NPE while obtaining row lock
Stack 2011-12-17, 23:08
On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: > I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own: > > <snippet> > return database.client().lockRow( > new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join(); > > </snippet> > > Any ideas what else could be wrong here? > Looking at the code on regionserver side, http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup, down around line 1994, its unlikely the region is null since we should throw NotServingRegionException if can't find region (and we check for null region name a few lines up) so maybe its something in the way we do obtainRowLock on line 1995? Any chance of your instrumenting the regionserver? Adding a bit of debugging and deploying the debugging regionserver? My guess is we haven't seen this before because not many use rowlocks (rowlocks if long-lived and lots of contending clients could freeze you out of the server; each client blocked waiting on rowlock to clear occupies a handler of which there are a bounded number). St.Ack
+
Stack 2011-12-17, 23:08
-
Re: NPE while obtaining row lock
Yves Langisch 2011-12-30, 11:12
Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5. - Yves On Dec 18, 2011, at 12:08 AM, Stack wrote: > On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: >> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own: >> >> <snippet> >> return database.client().lockRow( >> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join(); >> >> </snippet> >> >> Any ideas what else could be wrong here? >> > > Looking at the code on regionserver side, > http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup, > down around line 1994, its unlikely the region is null since we should > throw NotServingRegionException if can't find region (and we check for > null region name a few lines up) so maybe its something in the way we > do obtainRowLock on line 1995? > > Any chance of your instrumenting the regionserver? Adding a bit of > debugging and deploying the debugging regionserver? > > My guess is we haven't seen this before because not many use rowlocks > (rowlocks if long-lived and lots of contending clients could freeze > you out of the server; each client blocked waiting on rowlock to clear > occupies a handler of which there are a bounded number). > > St.Ack >
+
Yves Langisch 2011-12-30, 11:12
-
Re: NPE while obtaining row lock
Yves Langisch 2012-01-10, 15:51
Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more insights on this topic. Following an up to date stack trace: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) Caused by: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018) at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004) ... 5 more After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server. IMHO this case should be handled somehow and must not lead to a NPE. - Yves On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote: > Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5. > > - > Yves > > On Dec 18, 2011, at 12:08 AM, Stack wrote: > >> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: >>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own: >>> >>> <snippet> >>> return database.client().lockRow( >>> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join(); >>> >>> </snippet> >>> >>> Any ideas what else could be wrong here? >>> >> >> Looking at the code on regionserver side, >> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup, >> down around line 1994, its unlikely the region is null since we should >> throw NotServingRegionException if can't find region (and we check for >> null region name a few lines up) so maybe its something in the way we >> do obtainRowLock on line 1995? >> >> Any chance of your instrumenting the regionserver? Adding a bit of >> debugging and deploying the debugging regionserver? >> >> My guess is we haven't seen this before because not many use rowlocks >> (rowlocks if long-lived and lots of contending clients could freeze >> you out of the server; each client blocked waiting on rowlock to clear >> occupies a handler of which there are a bounded number). >> >> St.Ack >> >
+
Yves Langisch 2012-01-10, 15:51
-
Re: NPE while obtaining row lock
yuzhihong@... 2012-01-10, 16:04
Thanks for the analysis. Do you mind opening a Jira ? On Jan 10, 2012, at 7:51 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: > Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more insights on this topic. Following an up to date stack trace: > > java.lang.NullPointerException > at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986) > at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) > at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018) > at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004) > ... 5 more > > After checking the source code I've noticed that the value which is going to be put into the HashMap can be null in the case where the waitForLock flag is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, line 2111ff). The latter I think happens in our case as we have heavy load hitting the server. > > IMHO this case should be handled somehow and must not lead to a NPE. > > - > Yves > > On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote: > >> Still happens but before I'm going to add some debugging information I'll try to deploy the new version 0.90.5. >> >> - >> Yves >> >> On Dec 18, 2011, at 12:08 AM, Stack wrote: >> >>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: >>>> I'm using the async hbase client (1.0) and there is no way to choose a lockId on my own: >>>> >>>> <snippet> >>>> return database.client().lockRow( >>>> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_ROW)).join(); >>>> >>>> </snippet> >>>> >>>> Any ideas what else could be wrong here? >>>> >>> >>> Looking at the code on regionserver side, >>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup, >>> down around line 1994, its unlikely the region is null since we should >>> throw NotServingRegionException if can't find region (and we check for >>> null region name a few lines up) so maybe its something in the way we >>> do obtainRowLock on line 1995? >>> >>> Any chance of your instrumenting the regionserver? Adding a bit of >>> debugging and deploying the debugging regionserver? >>> >>> My guess is we haven't seen this before because not many use rowlocks >>> (rowlocks if long-lived and lots of contending clients could freeze >>> you out of the server; each client blocked waiting on rowlock to clear >>> occupies a handler of which there are a bounded number). >>> >>> St.Ack >>> >> >
+
yuzhihong@... 2012-01-10, 16:04
-
Re: NPE while obtaining row lock
Stack 2012-01-10, 18:07
On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: > IMHO this case should be handled somehow and must not lead to a NPE. >
Agree. Thanks for filing issue. St.Ack
+
Stack 2012-01-10, 18:07
-
Re: NPE while obtaining row lock
Stack 2012-01-22, 22:44
On Tue, Jan 10, 2012 at 10:07 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: > > IMHO this case should be handled somehow and must not lead to a NPE. > > > > I filed HBASE-5249. St.Ack
+
Stack 2012-01-22, 22:44
-
Re: NPE while obtaining row lock
Yves Langisch 2012-01-23, 06:31
I already filed https://issues.apache.org/jira/browse/HBASE-5171. I thought that I've posted the filed issue back to this thread, Sorry about that. - Yves On Jan 22, 2012, at 11:44 PM, Stack wrote: > On Tue, Jan 10, 2012 at 10:07 AM, Stack <[EMAIL PROTECTED]> wrote: > >> On Tue, Jan 10, 2012 at 7:51 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: >>> IMHO this case should be handled somehow and must not lead to a NPE. >>> >> >> > I filed HBASE-5249. > St.Ack
+
Yves Langisch 2012-01-23, 06:31
|
|