-Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Adrien Mogenet 2012-08-23, 18:02
1/ I checked quickly the GC logs and saw nothing. Since I need very
fast lookup I set the zookeeper.session.timeout parameter to 10s to
consider the RS as dead after very short pauses, and that did not
2/ I did not check but I don't think I ran out of sockets since the
ulimit has been set very high, but I'll check !
3/ Benchmark can launch several R/W threads, but even the simplest
program leads to my issue :
Configuration config = HBaseConfiguration.create();
HTable table = new HTable(config, "test");
for (<1, 10, 100 or 1000>)
4/ I will share more logs tomorrow to dig deeper, I personally need a
long STW-pause :-)
On Thu, Aug 23, 2012 at 7:49 PM, N Keywal <[EMAIL PROTECTED]> wrote:
> Hi Adrien,
> As well, if you can share the client code (number of threads, regions,
> is it a set of single get, or are they multi gets, this kind of
> On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
>> Hi Adrien,
>> I would love to see the region server side of the logs while those
>> socket timeouts happen, also check the GC log, but one thing people
>> often hit while doing pure random read workloads with tons of clients
>> is running out of sockets because they are all stuck in CLOSE_WAIT.
>> You can check that by using lsof. There are other discussion on this
>> mailing list about it.
>> On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet
>> <[EMAIL PROTECTED]> wrote:
>>> Hi there,
>>> While I'm performing read-intensive benchmarks, I'm seeing storm of
>>> "CallerDisconnectedException" in certain RegionServers. As the
>>> documentation says, my client received a SocketTimeoutException
>>> (60000ms etc...) at the same time.
>>> It's always happening and I get very poor read-performances (from 10
>>> to 5000 reads/sc) in a 10 nodes cluster.
>>> My benchmark consists in several iterations launching 10, 100 and 1000
>>> Get requests on a given random rowkey with a single CF/qualifier.
>>> I'm using HBase 0.94.1 (a few commits before the official stable
>>> release) with Hadoop 1.0.3.
>>> Bloom filters have been enabled (at the rowkey level).
>>> I do not find very clear informations about these exceptions. From the
>>> reference guide :
>>> (...) you should consider digging in a bit more if you aren't doing
>>> something to trigger them.
>>> Well... could you help me digging? :-)