|
Jack Levin
2011-04-20, 22:45
Stack
2011-04-21, 07:21
Jack Levin
2011-04-21, 07:45
tsuna
2011-04-21, 21:44
Stack
2011-04-22, 03:28
|
-
issue with thrift server pile up.Jack Levin 2011-04-20, 22:45
Hello -- we have an issue that looks like this. We have php app
front end and thrift servers that live on seperate boxes away from HBASE cluster. Everytime we do compaction on one of our 8 RS servers, we cause a thread pile up on Thrift servers that delay _all_ queries to HBASE. Our usual level of threads on Thrift is around 250 threads, but when we do a single compaction on a single region server, the Thrift threads jump to about 5,000, which makes HBASE to be slow for all regions , not just the one that compacts. How does one solve an issue like that? I know with mysql we can setup slaves and just redirect queries to slave zones that are faster, but what options do we have to avoid HBASE slowdown as a whole? -Jack
-
Re: issue with thrift server pile up.Stack 2011-04-21, 07:21
On Wed, Apr 20, 2011 at 3:45 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> Hello -- we have an issue that looks like this. We have php app > front end and thrift servers that live on seperate boxes away from > HBASE cluster. Everytime we do compaction on one of our 8 RS servers, > we cause a thread pile up on Thrift servers that delay _all_ queries > to HBASE. What kind of compaction. You are compacting a single region or all regions on this box? Minor or major compaction? What kind of hardware is this? When compaction runs on this machine, it is responsible for upping latency against regions hosted on this box alone or on all boxes; i.e. the compaction load is such that hdfs on all the cluster slows? > Our usual level of threads on Thrift is around 250 threads, > but when we do a single compaction on a single region server, the > Thrift threads jump to about 5,000, which makes HBASE to be slow for > all regions , not just the one that compacts. Can you give us snippets from logs of the regionserver that is compacting over the period where the thrift servers backup? St.Ack > How does one solve an > issue like that? I know with mysql we can setup slaves and just > redirect queries to slave zones that are faster, but what options do > we have to avoid HBASE slowdown as a whole? > > -Jack >
-
Re: issue with thrift server pile up.Jack Levin 2011-04-21, 07:45
On Thu, Apr 21, 2011 at 12:21 AM, Stack <[EMAIL PROTECTED]> wrote:
> On Wed, Apr 20, 2011 at 3:45 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> Hello -- we have an issue that looks like this. We have php app >> front end and thrift servers that live on seperate boxes away from >> HBASE cluster. Everytime we do compaction on one of our 8 RS servers, >> we cause a thread pile up on Thrift servers that delay _all_ queries >> to HBASE. > > What kind of compaction. You are compacting a single region or all > regions on this box? Minor or major compaction? > Minor compaction for one (single) region on a regionserver. > What kind of hardware is this? 16G Ram, 8 Core box, 6 disks, CPU user/system/io is slow at 5% or less. > >When compaction runs on this machine, > it is responsible for upping latency against regions hosted on this > box alone or on all boxes; i.e. the compaction load is such that hdfs > on all the cluster slows? > > HDFS on all boxes remain fast, as well as the box doing compaction, However all queries (scans, gets) to the same region server get slow. > >> Our usual level of threads on Thrift is around 250 threads, >> but when we do a single compaction on a single region server, the >> Thrift threads jump to about 5,000, which makes HBASE to be slow for >> all regions , not just the one that compacts. > > Can you give us snippets from logs of the regionserver that is > compacting over the period where the thrift servers backup? > Yes: 2011-04-21 00:39:53,342 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:39:53,342 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d.. Current region memstore size 256.0m 2011-04-21 00:39:53,342 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores 2011-04-21 00:39:59,938 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/.tmp/7579206879034321398 to hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/common/3415152704800366912 2011-04-21 00:40:03,123 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/common/3415152704800366912, entries=866486, sequenceid=1934982293, memsize=212.7m, filesize=65.3m to service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:40:05,611 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/.tmp/4177410838492147388 to hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/link/6676648044671628059 2011-04-21 00:40:06,420 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/link/6676648044671628059, entries=25296, sequenceid=1934982293, memsize=6.8m, filesize=2.5m to service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:40:07,519 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/.tmp/2030906487055196702 to hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/photo/7887239631758414992 2011-04-21 00:40:09,520 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/photo/7887239631758414992, entries=10896, sequenceid=1934982293, memsize=3.4m, filesize=1.5m to service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:40:11,344 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/.tmp/2825213906012468466 to hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/place/8085991053959255054 2011-04-21 00:40:11,540 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/place/8085991053959255054, entries=1219, sequenceid=1934982293, memsize=334.7k, filesize=122.1k to service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:40:12,194 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/.tmp/7899422724259211613 to hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/text/7578566530358624021 2011-04-21 00:40:17,361 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/text/7578566530358624021, entries=108290, sequenceid=1934982293, memsize=32.5m, filesize=14.0m to service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:40:17,519 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/.tmp/7207348071728550157 to hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/video/374590162251960181 2011-04-21 00:40:17,530 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://mtag1:9000/hbase/service_message/ccdb0fe2e1976495836e926c57a7a36d/video/374590162251960181, entries=924, sequenceid=1934982293, memsize=265.2k, filesize=102.5k to service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e926c57a7a36d. 2011-04-21 00:40:17,533 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~256.0m for region service_message,1_60443424997912576,1303351424896.ccdb0fe2e1976495836e92
-
Re: issue with thrift server pile up.tsuna 2011-04-21, 21:44
On Wed, Apr 20, 2011 at 3:45 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> Hello -- we have an issue that looks like this. We have php app > front end and thrift servers that live on seperate boxes away from > HBASE cluster. Everytime we do compaction on one of our 8 RS servers, > we cause a thread pile up on Thrift servers that delay _all_ queries > to HBASE. Our usual level of threads on Thrift is around 250 threads, > but when we do a single compaction on a single region server, the > Thrift threads jump to about 5,000, which makes HBASE to be slow for > all regions , not just the one that compacts. How does one solve an > issue like that? I know with mysql we can setup slaves and just > redirect queries to slave zones that are faster, but what options do > we have to avoid HBASE slowdown as a whole? Writing a new Thrift server that's asynchronous and uses asynchbase would avoid this issue, I believe, because only requests that are blocked / slowed down by the compaction would be impacted, but other requests would still flow normally. I've been considering to do something like that as I believe we have the same problem at StumbleUpon (although to the impact is less in our case because our cluster is bigger than 8 RS). -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
-
Re: issue with thrift server pile up.Stack 2011-04-22, 03:28
Offline, Jack figured that upping his handlers -- he was at 25 and he
upped them to 100 -- cured his slow scans. He figured it by enabling rpc logging in a regionserver UI (Go to 'Log Level' and set org.apache.hadoop.ipc.HBaseServer to DEBUG). This shows vitals on all rpc requests. The logging is voluminous but you can see things like how long a request takes, how long its been waiting in the rpc queue to be processed, etc. St.Ack On Thu, Apr 21, 2011 at 2:44 PM, tsuna <[EMAIL PROTECTED]> wrote: > On Wed, Apr 20, 2011 at 3:45 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> Hello -- we have an issue that looks like this. We have php app >> front end and thrift servers that live on seperate boxes away from >> HBASE cluster. Everytime we do compaction on one of our 8 RS servers, >> we cause a thread pile up on Thrift servers that delay _all_ queries >> to HBASE. Our usual level of threads on Thrift is around 250 threads, >> but when we do a single compaction on a single region server, the >> Thrift threads jump to about 5,000, which makes HBASE to be slow for >> all regions , not just the one that compacts. How does one solve an >> issue like that? I know with mysql we can setup slaves and just >> redirect queries to slave zones that are faster, but what options do >> we have to avoid HBASE slowdown as a whole? > > Writing a new Thrift server that's asynchronous and uses asynchbase > would avoid this issue, I believe, because only requests that are > blocked / slowed down by the compaction would be impacted, but other > requests would still flow normally. I've been considering to do > something like that as I believe we have the same problem at > StumbleUpon (although to the impact is less in our case because our > cluster is bigger than 8 RS). > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com > |