Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBaseClient.call() hang


+
Bryan Keller 2012-12-14, 22:59
+
lars hofhansl 2012-12-15, 01:31
+
Bryan Keller 2012-12-15, 05:29
+
Ted Yu 2012-12-15, 05:49
+
Ted Yu 2012-12-15, 05:59
+
Bijieshan 2012-12-17, 01:31
+
Bryan Keller 2012-12-17, 17:18
Copy link to this message
-
Re: HBaseClient.call() hang
Don't increase RS timeout to avoid this issue. what size of your block
size? and can you paste your JVM options here?

I also met a long GC problem, but I tuned jvm options, it works very well
now.
On Tue, Dec 18, 2012 at 1:18 AM, Bryan Keller <[EMAIL PROTECTED]> wrote:

> It seems there was a cascading effect. The regionservers were busy with
> scanning a table, which resulted in some long GC's. The GC's were long
> enough to trigger the Zookeeper timeout on at least one regionserver, which
> resulted in the regionserver shutting itself down. This then caused the
> Object.wait() call which got stuck, and only exited after the very long RPC
> timeout.
>
> I have done a fair amount of work optimizing the GCs, and I increased the
> regionserver timeouts, which should help with the regionserver shutdowns.
> But if a regionserver does shut down for some other reason, this will still
> result in the Object.wait() hang.
>
> One approach might be to have the regionservers send back a keep-alive, or
> progress, message during a scan, and that message would reset the RPC
> timer. The regionserver could do this every x number of rows processed
> server-side. Then the RPC timeout could be something more sensible rather
> than being set to the longest time it takes to scan a region.
>
> HBASE-5416 looks useful, it will make scans faster, but the problem I'm
> encountering will still be present, but perhaps I could set the RPC timeout
> a bit lower. HBASE-6313 might fix the hang, in which case I could live with
> the longer RPC timeout setting.
>
>
> On Dec 14, 2012, at 9:49 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > Bryan:
> >
> > bq. My only thought would be to forego using filters
> > Please keep using filters.
> >
> > I and Sergey are working on HBASE-5416: Improve performance of scans with
> > some kind of filters
> > This feature allows you to specify one column family as being essential.
> > The other column family is only returned to client when essential column
> > family matches. I wonder if this may be of help to you.
> >
> > You mentioned regionserver going down or being busy. I assume it was not
> > often that regionserver(s) went down. For busy region server, did you try
> > jstack'ing regionserver process ?
> >
> > Thanks
> >
> > On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <[EMAIL PROTECTED]> wrote:
> >
> >> I have encountered a problem with HBaseClient.call() hanging. This
> occurs
> >> when one of my regionservers goes down while performing a table scan.
> >>
> >> What exacerbates this problem is that the scan I am performing uses
> >> filters, and the region size of the table is large (4gb). Because of
> this,
> >> it can take several minutes for a row to be returned when calling
> >> scanner.next(). Apparently there is no keep alive message being sent
> back
> >> to the scanner while the region server is busy, so I had to increase the
> >> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
> >> call will timeout waiting for the regionserver to send something back.
> >>
> >> The result is that this HBaseClient.call() hang is made much worse,
> >> because it won't time out for 60 minutes.
> >>
> >> I have a couple of questions:
> >>
> >> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I
> noticed
> >> that call.wait() is not using any timeout so it will wait indefinitely
> >> until interrupted externally
> >>
> >> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
> >> very large number? My only thought would be to forego using filters and
> do
> >> the filtering client side, which seems pretty inefficient
> >>
> >> Here is a stack dump of the thread that was hung:
> >>
> >> Thread 10609: (state = BLOCKED)
> >> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
> >> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
> >> java.net.InetSocketAddress, java.lang.Class,
+
Mesika, Asaf 2012-12-18, 20:27
+
Bryan Keller 2012-12-18, 22:25
+
Bryan Keller 2012-12-18, 20:05