|
Ricardo Vilaça
2012-10-10, 12:51
Stack
2012-10-11, 03:24
Mohit Anchlia
2012-10-11, 04:25
Ricardo Vilaça
2012-10-12, 10:56
Vincent Barat
2012-11-20, 18:54
Stack
2012-11-21, 05:39
Stack
2012-11-21, 05:47
Vincent Barat
2012-11-21, 18:37
|
-
HBase TuningRicardo Vilaça 2012-10-10, 12:51
Hi,
I'm doing some experiments with HBase 0.92 and Hadoop 1.0.1. We have a small cluster with dual core machines with 8 GB of RAM. The cluster has: 1 node running a NameNode and HBase master; 1 node running Zookeeper; and 20 nodes running RegionServer co-located with DataNode. The application has 8 tables and all of them are partitioned in regions, resulting in a total of 164 regions in the cluster, and they are evenly distributed across all RegionServers, 8 regions per RegionServer, less that 1.5GB of data. We had already done some configuration tuning to HDFS and HBase with the main parameters being: * hbase.regionserver.handler.count=100 * hfile.block.cache.size=0.5 * hbase.regionserver.global.memstore.upperLimit=0.15 * hbase.regionserver.global.memstore.lowerLimit=0.1 * hbase.hregion.memstore.mslab.enabled=true * HBASE_REGIONSERVER_OPTS="-Xmx6546m -Xms4046m -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70" * dfs.replication=3 * dfs.datanode.max.xcievers=16384 * dfs.datanode.handler.count=4 Clients are running on quad-core machines also with 8GB of RAM. The application has several clients (each running in a thread). A single client node is available to handle 400 clients with linear throughout and acceptable performance (below 1.5 seconds) for application operations (involving several HBase operations). In detail the mix of involved HBase operations per second is as follows: * 480 scans with an average size of 45 * 88 single row puts * 51 batch gets with average size 100. * 55 deletes * 830 single row gets With this configuration the RegionServers has no IO wait, the blockCacheHitRatio is almost 100%, hdfsBlocksLocalityIndex is 100, network usage is low, and the CPU in all RegionServers is almost idle, more than 90%. However, when adding an additional client node, with also 400 clients, the latency increases 3 times, but the RegionServers remains idle more than 80%. I had tried different values for the hbase.regionserver.handler.count and also for the hbase.client.ipc.pool size and type but without any improvement. Is there any configuration parameter that can improve the latency with several concurrent threads and more than one HBase client node and/or which JMX parameters should I monitor on RegionServers to check what may be causing this and how could I achieve better utilization of CPU at RegionServers? Regards, -- Ricardo Vila�a --- High-Assurance Software Lab INESC TEC & Universidade do Minho http://gsd.di.uminho.pt/members/rmvilaca
-
Re: HBase TuningStack 2012-10-11, 03:24
On Wed, Oct 10, 2012 at 5:51 AM, Ricardo Vilaça <[EMAIL PROTECTED]> wrote:
> However, when adding an additional client node, with also 400 clients, > the latency increases 3 times, > but the RegionServers remains idle more than 80%. I had tried different > values for the hbase.regionserver.handler.count and also > for the hbase.client.ipc.pool size and type but without any improvement. > I was going to suggest that it sounded like all handlers are occupied... but it sounds like you tried upping them. Is this going from one client node (serving 400 clients) to two client nodes (serving 800 clients)? Where are you measuring from? Application side? Can you figure if we are binding up in HBase or in the client node? What does a client node look like? It is something hosting an hbase client? A webserver or something? > Is there any configuration parameter that can improve the latency with > several concurrent threads and more than one HBase client node > and/or which JMX parameters should I monitor on RegionServers to check > what may be causing this and how could I achieve better utilization of CPU > at RegionServers? > It sounds like all your data is memory resident given its size and the lack of iowait. Is that so? Studying the regionserver metrics, are they fairly constant across the addition of the new client node? St.Ack
-
Re: HBase TuningMohit Anchlia 2012-10-11, 04:25
What's the best way to see if all handlers are occupied? I am probably
running into similar issue but would like to check. On Wed, Oct 10, 2012 at 8:24 PM, Stack <[EMAIL PROTECTED]> wrote: > On Wed, Oct 10, 2012 at 5:51 AM, Ricardo Vilaça <[EMAIL PROTECTED]> > wrote: > > However, when adding an additional client node, with also 400 clients, > > the latency increases 3 times, > > but the RegionServers remains idle more than 80%. I had tried different > > values for the hbase.regionserver.handler.count and also > > for the hbase.client.ipc.pool size and type but without any improvement. > > > > I was going to suggest that it sounded like all handlers are > occupied... but it sounds like you tried upping them. > Is this going from one client node (serving 400 clients) to two client nodes (serving 800 clients)? Where are you measuring from? Application side? Can you figure if we are binding up in HBase or in the client node? What does a client node look like? It is something hosting an hbase client? A webserver or something? > > > Is there any configuration parameter that can improve the latency with > > several concurrent threads and more than one HBase client node > > and/or which JMX parameters should I monitor on RegionServers to check > > what may be causing this and how could I achieve better utilization of > CPU > > at RegionServers? > > > > It sounds like all your data is memory resident given its size and the > lack of iowait. Is that so? Studying the regionserver metrics, are > they fairly constant across the addition of the new client node? > > St.Ack >
-
Re: HBase TuningRicardo Vilaça 2012-10-12, 10:56
Hi,
Em 11/10/12 04:24, Stack escreveu: > On Wed, Oct 10, 2012 at 5:51 AM, Ricardo Vila�a <[EMAIL PROTECTED]> wrote: >> However, when adding an additional client node, with also 400 clients, >> the latency increases 3 times, >> but the RegionServers remains idle more than 80%. I had tried different >> values for the hbase.regionserver.handler.count and also >> for the hbase.client.ipc.pool size and type but without any improvement. >> > I was going to suggest that it sounded like all handlers are > occupied... but it sounds like you tried upping them. Yes, had already tried to increase to 200 but without improvement on the application latency. However, the output of the active IPC handlers, using the Web interface, is strange. For region servers I can see in a given instant at most 4 IPC handler active but if I see the state of all other IPC handlers they are waiting for 0 seconds. In the master the IPC handlers are also almost all in the waiting state but for a few seconds. > Is this going from one client node (serving 400 clients) to two client > nodes (serving 800 clients)? Yes, the huge increase in latency is when going for one client node to two client nodes. However, increasing the number of clients in a single node also adds to latency but a small increase. > Where are you measuring from? Application side? Can you figure if we > are binding up in HBase or in the client node? This measures are from the application side. As the huge increase in latency is happening when increasing the number of clients I suspect that the binding up is in the HBase maybe due to some incorrect configuration. > What does a client node look like? It is something hosting an hbase > client? A webserver or something? Yes, the client node is hosting an HBase client. >> Is there any configuration parameter that can improve the latency with >> several concurrent threads and more than one HBase client node >> and/or which JMX parameters should I monitor on RegionServers to check >> what may be causing this and how could I achieve better utilization of CPU >> at RegionServers? >> > It sounds like all your data is memory resident given its size and the > lack of iowait. Is that so? Studying the regionserver metrics, are > they fairly constant across the addition of the new client node? Yes, all data is memory resident. As far as I can see, the regionserver metrics are fairly constant. Thanks, -- Ricardo Vila�a --- High-Assurance Software Lab INESC TEC & Universidade do Minho http://gsd.di.uminho.pt/members/rmvilaca
-
Re: HBase TuningVincent Barat 2012-11-20, 18:54
Hi,
It seems there is a potential contention in the HBase client code (a useless synchronized method) You may try to use this patch : https://issues.apache.org/jira/browse/HBASE-7069 I face similar issues on my production cluster since I upgraded to HBase 0.92. I will test this patch tomorrow... More info matter. Cheers Le 12/10/12 12:56, Ricardo Vila�a a �crit : > Hi, > > Em 11/10/12 04:24, Stack escreveu: >> On Wed, Oct 10, 2012 at 5:51 AM, Ricardo Vila�a <[EMAIL PROTECTED]> wrote: >>> However, when adding an additional client node, with also 400 clients, >>> the latency increases 3 times, >>> but the RegionServers remains idle more than 80%. I had tried different >>> values for the hbase.regionserver.handler.count and also >>> for the hbase.client.ipc.pool size and type but without any improvement. >>> >> I was going to suggest that it sounded like all handlers are >> occupied... but it sounds like you tried upping them. > Yes, had already tried to increase to 200 but without improvement > on the application latency. However, the output of the active IPC > handlers, using the Web interface, > is strange. For region servers I can see in a given instant at most 4 > IPC handler active but if I > see the state of all other IPC handlers they are waiting for 0 seconds. > In the master the IPC handlers are also almost all in the waiting state > but for a few seconds. >> Is this going from one client node (serving 400 clients) to two client >> nodes (serving 800 clients)? > Yes, the huge increase in latency is when going for one client node to > two client nodes. However, increasing the number of clients in a single > node also adds to latency but a small increase. >> Where are you measuring from? Application side? Can you figure if we >> are binding up in HBase or in the client node? > This measures are from the application side. As the huge increase in > latency > is happening when increasing the number of clients I suspect that the > binding up is in the > HBase maybe due to some incorrect configuration. > >> What does a client node look like? It is something hosting an hbase >> client? A webserver or something? > Yes, the client node is hosting an HBase client. >>> Is there any configuration parameter that can improve the latency with >>> several concurrent threads and more than one HBase client node >>> and/or which JMX parameters should I monitor on RegionServers to check >>> what may be causing this and how could I achieve better utilization of CPU >>> at RegionServers? >>> >> It sounds like all your data is memory resident given its size and the >> lack of iowait. Is that so? Studying the regionserver metrics, are >> they fairly constant across the addition of the new client node? > Yes, all data is memory resident. As far as I can see, the regionserver > metrics are > fairly constant. > > Thanks, >
-
Re: HBase TuningStack 2012-11-21, 05:39
On Wed, Oct 10, 2012 at 9:25 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> What's the best way to see if all handlers are occupied? I am probably > running into similar issue but would like to check. > In 0.94, in the UI, you can see what all handlers are doing. Click on 'Show All RPC Handler Tasks". If not on 0.94, click on the thread dump and check how many are other than in the waiting state. Do it a few times. St.Ack
-
Re: HBase TuningStack 2012-11-21, 05:47
On Fri, Oct 12, 2012 at 3:56 AM, Ricardo Vilaça <[EMAIL PROTECTED]> wrote:
> Yes, had already tried to increase to 200 but without improvement > on the application latency. However, the output of the active IPC > handlers, using the Web interface, > is strange. For region servers I can see in a given instant at most 4 > IPC handler active but if I > see the state of all other IPC handlers they are waiting for 0 seconds. Where are they waiting? Want to paste a thread dump on pastebin or something when you are seeing the phenomenon and paste a link in here? Are all handlers doing work? > In the master the IPC handlers are also almost all in the waiting state > but for a few seconds. >> Is this going from one client node (serving 400 clients) to two client >> nodes (serving 800 clients)? > Yes, the huge increase in latency is when going for one client node to > two client nodes. However, increasing the number of clients in a single > node also adds to latency but a small increase. >> Where are you measuring from? Application side? Can you figure if we >> are binding up in HBase or in the client node? > This measures are from the application side. As the huge increase in > latency > is happening when increasing the number of clients I suspect that the > binding up is in the > HBase maybe due to some incorrect configuration. > What happens if 10 client instances each of ten threads doing your task list? >> What does a client node look like? It is something hosting an hbase >> client? A webserver or something? > Yes, the client node is hosting an HBase client. >>> Is there any configuration parameter that can improve the latency with >>> several concurrent threads and more than one HBase client node >>> and/or which JMX parameters should I monitor on RegionServers to check >>> what may be causing this and how could I achieve better utilization of CPU >>> at RegionServers? >>> >> It sounds like all your data is memory resident given its size and the >> lack of iowait. Is that so? Studying the regionserver metrics, are >> they fairly constant across the addition of the new client node? > > Yes, all data is memory resident. As far as I can see, the regionserver > metrics are > fairly constant. > You have cluster diagrams? The amount of traffic in and out of the box is constant when you up the number of client instances from 400 to 800? You are not something silly like network bound? St.Ack
-
Re: HBase TuningVincent Barat 2012-11-21, 18:37
Forget about this: it does not help
Le 20/11/12 19:54, Vincent Barat a �crit : > Hi, > > It seems there is a potential contention in the HBase client code > (a useless synchronized method) > You may try to use this patch : > https://issues.apache.org/jira/browse/HBASE-7069 > > I face similar issues on my production cluster since I upgraded to > HBase 0.92. I will test this patch tomorrow... > More info matter. > > Cheers > > Le 12/10/12 12:56, Ricardo Vila�a a �crit : >> Hi, >> >> Em 11/10/12 04:24, Stack escreveu: >>> On Wed, Oct 10, 2012 at 5:51 AM, Ricardo Vila�a >>> <[EMAIL PROTECTED]> wrote: >>>> However, when adding an additional client node, with also 400 >>>> clients, >>>> the latency increases 3 times, >>>> but the RegionServers remains idle more than 80%. I had tried >>>> different >>>> values for the hbase.regionserver.handler.count and also >>>> for the hbase.client.ipc.pool size and type but without any >>>> improvement. >>>> >>> I was going to suggest that it sounded like all handlers are >>> occupied... but it sounds like you tried upping them. >> Yes, had already tried to increase to 200 but without improvement >> on the application latency. However, the output of the active IPC >> handlers, using the Web interface, >> is strange. For region servers I can see in a given instant at >> most 4 >> IPC handler active but if I >> see the state of all other IPC handlers they are waiting for 0 >> seconds. >> In the master the IPC handlers are also almost all in the waiting >> state >> but for a few seconds. >>> Is this going from one client node (serving 400 clients) to two >>> client >>> nodes (serving 800 clients)? >> Yes, the huge increase in latency is when going for one client >> node to >> two client nodes. However, increasing the number of clients in a >> single >> node also adds to latency but a small increase. >>> Where are you measuring from? Application side? Can you figure >>> if we >>> are binding up in HBase or in the client node? >> This measures are from the application side. As the huge >> increase in >> latency >> is happening when increasing the number of clients I suspect that >> the >> binding up is in the >> HBase maybe due to some incorrect configuration. >> >>> What does a client node look like? It is something hosting an >>> hbase >>> client? A webserver or something? >> Yes, the client node is hosting an HBase client. >>>> Is there any configuration parameter that can improve the >>>> latency with >>>> several concurrent threads and more than one HBase client node >>>> and/or which JMX parameters should I monitor on RegionServers >>>> to check >>>> what may be causing this and how could I achieve better >>>> utilization of CPU >>>> at RegionServers? >>>> >>> It sounds like all your data is memory resident given its size >>> and the >>> lack of iowait. Is that so? Studying the regionserver metrics, >>> are >>> they fairly constant across the addition of the new client node? >> Yes, all data is memory resident. As far as I can see, the >> regionserver >> metrics are >> fairly constant. >> >> Thanks, >> > |