-Re: Read speed down after long running
Yi Liang 2011-12-29, 01:54
I don't restart client processes(in my case, they're thrift servers), I
only restart the master and rs. Do you mean I should also restart the
I'm now checking the code of thrift server, it seems that it does use
somewhere like createTable() and deleteTable().
I don't see any clue when checking rs with jstack, which states/threads
should I check more carefully?. When the problem occurs, we see bigger IO
than usual, the memory and network look ok.
Thank you for your suggestions!
On Wed, Dec 28, 2011 at 4:21 PM, Gaojinchao <[EMAIL PROTECTED]> wrote:
> I think you need check the threaddump(Client and RS) and resources(memory,
> IO and network) of your cluster.
> 发件人: Lars H [mailto:[EMAIL PROTECTED]]
> 发送时间: 2011年12月28日 0:32
> 收件人: [EMAIL PROTECTED]
> 抄送: [EMAIL PROTECTED]
> 主题: Re: Read speed down after long running
> When you restart HBase are you also restarting the client process?
> Are you using HBaseAdmin.tableExists?
> If so you might be running into HBASE-5073
> -- Lars
> Yi Liang <[EMAIL PROTECTED]> schrieb:
> >Hi all,
> >We're running hbase 0.90.3 for one read intensive application.
> >We find after long running(2 weeks or 1 month or longer), the read speed
> >will become much lower.
> >For example, a get_rows operation of thrift to fetch 20 rows (about 4k
> >every row) could take >2 second, sometimes even >5 seconds. When it
> >happens, we can see cpu_wio keeps at about 10.
> >But if we restart hbase(only master and regionservers) with stop-hbase.sh
> >and start-hbase.sh, we can see the read speed back to normal immediately,
> >which is <200 ms for every get_rows operation, and the cpu_wio drops to
> >about 2.
> >When the problem appears, there's no exception in logs, and no
> >flush/compaction, nothing abnormal except a few warning logs sometimes
> >2011-12-27 15:50:20,307 WARN
> >IPC Server handler 52 on 60020 took 1546 ms appending an edit to hlog;
> >editcount=1, len~=9.8k
> >Our cluster has 10 region servers, each with 25g heap size, 64% of which
> >used for cache. The're some m/r jobs keep running in another cluster to
> >feed data into the this hbase. Every night, we do flush and major
> >compaction. Usually there's no flush or compaction in the daytime.
> >Could anybody explain why the read speed could become lower after long
> >running, and why it back to normal immediately after restarting hbase?
> >Every advice will be highly appreciated.