|
Martin Alig
2012-06-20, 13:39
Elliott Clark
2012-06-20, 16:31
Dave Wang
2012-06-20, 17:04
Martin Alig
2012-06-21, 11:27
Suraj Varma
2012-07-03, 22:17
Martin Alig
2012-07-12, 09:44
|
-
Blocking InsertsMartin Alig 2012-06-20, 13:39
Hi
I'm doing some evaluations with HBase. The workload I'm facing is mainly insert-only. Currently I'm inserting 1KB rows, where 100Bytes go into one column. I have the following cluster machines at disposal: Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled) 24 GiB Memory 1 GigE 2x 15k RPM Sas 73 GB (RAID1) I have 10 Nodes. The first node runs: Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a RegionServer The other nodes run: Datanode and RegionServer Now running my test client and inserting rows, the throughput goes up to 150'000 inserts/sec. But then after some time the throughput drops down to 0 inserts/sec for quite some time, before it goes up again. My assumption is, that it happens when the RegionServers start to write the data from memory to the disks. I know, that the recommended hardware for HBase should contain multiple disks using JBOD or RAID 0. But at that point I am limited right now. I am just asking if in my hardware setup, the blocking periods are really caused by the non-optimal disk configuration. Thank you in advance for any suggestions. Martin
-
Re: Blocking InsertsElliott Clark 2012-06-20, 16:31
Basically without metrics on what's going on it's tough to know for sure.
I would turn on GC logging and make sure that is not playing a part, get metrics on IO while this is going on, and look through the logs to see what is happening when you notice the pause. On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[EMAIL PROTECTED]> wrote: > Hi > > I'm doing some evaluations with HBase. The workload I'm facing is mainly > insert-only. > Currently I'm inserting 1KB rows, where 100Bytes go into one column. > > I have the following cluster machines at disposal: > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled) > 24 GiB Memory > 1 GigE > 2x 15k RPM Sas 73 GB (RAID1) > > I have 10 Nodes. > The first node runs: > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a > RegionServer > > The other nodes run: > > Datanode and RegionServer > > > Now running my test client and inserting rows, the throughput goes up to > 150'000 inserts/sec. But then after some time the throughput drops down to > 0 inserts/sec for quite some time, before it goes up again. > My assumption is, that it happens when the RegionServers start to write the > data from memory to the disks. I know, that the recommended hardware for > HBase should contain multiple disks using JBOD or RAID 0. > But at that point I am limited right now. > > I am just asking if in my hardware setup, the blocking periods are really > caused by the non-optimal disk configuration. > > > Thank you in advance for any suggestions. > > > Martin >
-
Re: Blocking InsertsDave Wang 2012-06-20, 17:04
I'd also remove the DN and RS from the node running ZK, NN, etc. as you
don't want heavweight processes on that node. - Dave On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <[EMAIL PROTECTED]>wrote: > Basically without metrics on what's going on it's tough to know for sure. > > I would turn on GC logging and make sure that is not playing a part, get > metrics on IO while this is going on, and look through the logs to see what > is happening when you notice the pause. > > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[EMAIL PROTECTED]> > wrote: > > > Hi > > > > I'm doing some evaluations with HBase. The workload I'm facing is mainly > > insert-only. > > Currently I'm inserting 1KB rows, where 100Bytes go into one column. > > > > I have the following cluster machines at disposal: > > > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled) > > 24 GiB Memory > > 1 GigE > > 2x 15k RPM Sas 73 GB (RAID1) > > > > I have 10 Nodes. > > The first node runs: > > > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a > > RegionServer > > > > The other nodes run: > > > > Datanode and RegionServer > > > > > > Now running my test client and inserting rows, the throughput goes up to > > 150'000 inserts/sec. But then after some time the throughput drops down > to > > 0 inserts/sec for quite some time, before it goes up again. > > My assumption is, that it happens when the RegionServers start to write > the > > data from memory to the disks. I know, that the recommended hardware for > > HBase should contain multiple disks using JBOD or RAID 0. > > But at that point I am limited right now. > > > > I am just asking if in my hardware setup, the blocking periods are really > > caused by the non-optimal disk configuration. > > > > > > Thank you in advance for any suggestions. > > > > > > Martin > > >
-
Re: Blocking InsertsMartin Alig 2012-06-21, 11:27
Thank you for the suggestions.
So I changed the setup and now have: 1 Master running Namenode, SecondaryNamenode, ZK and the HMaster 7 Slaves running Datanode and Regionserver 2 Clients to insert data What I forgot in my first post, that sometimes the clients even get a SocketTimeOutException when inserting the data. (of course during that time 0 inserts are done) By looking at the logs, (I also turned on the gc logs) I see the following: Multiple consecutive entries like: 2012-06-21 11:42:13,962 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.: memstore size 1.0g is >= than blocking 1.0g size Shortly after those entries, many entries like: 2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d), rpc version=1, client version=29, methodsFingerPrint=-1508511443","client":" 10.110.129.12:54624 ","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} Looking at the gc-logs, many entries like: 2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs] 4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24 sys=0.00, real=0.01 secs] But always arround 0.01 secs - 0.04secs. And also from the gc-log: 2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93 sys=2.24, real=10.45 secs] Is the 10.45 secs too long? Or what exactly should I watch out for in the gc logs? I also configured ganglia to have a look at some more metrics. Looking at io_wait (which should matter concerning my question to the disks), I can observe values between 10 % and 25 % on the regionserver. Should that be lower? Btw. I'm using HBase 0.94 and Hadoop 1.0.3. Thank you again. Martin On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <[EMAIL PROTECTED]> wrote: > I'd also remove the DN and RS from the node running ZK, NN, etc. as you > don't want heavweight processes on that node. > > - Dave > > On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <[EMAIL PROTECTED] > >wrote: > > > Basically without metrics on what's going on it's tough to know for sure. > > > > I would turn on GC logging and make sure that is not playing a part, get > > metrics on IO while this is going on, and look through the logs to see > what > > is happening when you notice the pause. > > > > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[EMAIL PROTECTED]> > > wrote: > > > > > Hi > > > > > > I'm doing some evaluations with HBase. The workload I'm facing is > mainly > > > insert-only. > > > Currently I'm inserting 1KB rows, where 100Bytes go into one column. > > > > > > I have the following cluster machines at disposal: > > > > > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled) > > > 24 GiB Memory > > > 1 GigE > > > 2x 15k RPM Sas 73 GB (RAID1) > > > > > > I have 10 Nodes. > > > The first node runs: > > > > > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a > > > RegionServer > > > > > > The other nodes run: > > > > > > Datanode and RegionServer > > > > > > > > > Now running my test client and inserting rows, the throughput goes up > to > > > 150'000 inserts/sec. But then after some time the throughput drops down > > to > > > 0 inserts/sec for quite some time, before it goes up again. > > > My assumption is, that it happens when the RegionServers start to write > > the > > > data from memory to the disks. I know, that the recommended hardware > for > > > HBase should contain multiple disks using JBOD or RAID 0. > > > But at that point I am limited right now. > > > > > > I am just asking if in my hardware setup, the blocking periods are > really > > > caused by the non-optimal disk configuration. > > > > > > > > > Thank you in advance for any suggestions. > > > > > > > > > Martin > > > > > >
-
Re: Blocking InsertsSuraj Varma 2012-07-03, 22:17
In your case, likely you are hitting the blocking store files
(hbase.hstore.blockingStoreFiles default:7) and/or hbase.hregion.memstore.block.multiplier - check out http://hbase.apache.org/book/config.files.html for more details on this configurations and how they affect your insert performance. On ganglia, also check whether you have a compaction queue spiking during these timeouts. --Suraj On Thu, Jun 21, 2012 at 4:27 AM, Martin Alig <[EMAIL PROTECTED]> wrote: > Thank you for the suggestions. > > So I changed the setup and now have: > 1 Master running Namenode, SecondaryNamenode, ZK and the HMaster > 7 Slaves running Datanode and Regionserver > 2 Clients to insert data > > > What I forgot in my first post, that sometimes the clients even get a > SocketTimeOutException when inserting the data. (of course during that time > 0 inserts are done) > By looking at the logs, (I also turned on the gc logs) I see the following: > > Multiple consecutive entries like: > 2012-06-21 11:42:13,962 INFO org.apache.hadoop.hbase.regionserver.HRegion: > Blocking updates for 'IPC Server handler 6 on 60020' on region > usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.: memstore > size 1.0g is >= than blocking 1.0g size > > Shortly after those entries, many entries like: > 2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d), > rpc version=1, client version=29, methodsFingerPrint=-1508511443","client":" > 10.110.129.12:54624 > ","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > > Looking at the gc-logs, many entries like: > 2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs] > 4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24 sys=0.00, > real=0.01 secs] > > But always arround 0.01 secs - 0.04secs. > > And also from the gc-log: > 2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93 > sys=2.24, real=10.45 secs] > > Is the 10.45 secs too long? > Or what exactly should I watch out for in the gc logs? > > > I also configured ganglia to have a look at some more metrics. Looking at > io_wait (which should matter concerning my question to the disks), I can > observe values between 10 % and 25 % on the regionserver. > Should that be lower? > > Btw. I'm using HBase 0.94 and Hadoop 1.0.3. > > > Thank you again. > > > Martin > > > > On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <[EMAIL PROTECTED]> wrote: > >> I'd also remove the DN and RS from the node running ZK, NN, etc. as you >> don't want heavweight processes on that node. >> >> - Dave >> >> On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <[EMAIL PROTECTED] >> >wrote: >> >> > Basically without metrics on what's going on it's tough to know for sure. >> > >> > I would turn on GC logging and make sure that is not playing a part, get >> > metrics on IO while this is going on, and look through the logs to see >> what >> > is happening when you notice the pause. >> > >> > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[EMAIL PROTECTED]> >> > wrote: >> > >> > > Hi >> > > >> > > I'm doing some evaluations with HBase. The workload I'm facing is >> mainly >> > > insert-only. >> > > Currently I'm inserting 1KB rows, where 100Bytes go into one column. >> > > >> > > I have the following cluster machines at disposal: >> > > >> > > Intel Xeon L5520 2.26 Ghz (Nehalem, with HT enabled) >> > > 24 GiB Memory >> > > 1 GigE >> > > 2x 15k RPM Sas 73 GB (RAID1) >> > > >> > > I have 10 Nodes. >> > > The first node runs: >> > > >> > > Namenode, SecondaryNamenode, Datanode, HMaster, Zookeeper, and a >> > > RegionServer >> > > >> > > The other nodes run: >> > > >> > > Datanode and RegionServer >> > > >> > > >> > > Now running my test client and inserting rows, the throughput goes up >> to >> > > 150'000 inserts/sec. But then after some time the throughput drops down >> > to
-
Re: Blocking InsertsMartin Alig 2012-07-12, 09:44
Thank you for the comment.
Compaction queue seems to be at 0 (?) all the time. About the blocking store file: I already increased this value, but I could not see any improvements. Going through the logs during a "blocking" period, I often see a "CompactionRequest". Then, for 1 minute or so nothing, and then it continues. Or similar, in the logs I see "Finished memstore flush" and then for 2 minutes nothing, and then it continues. And of course, insertions continue also. Is this just the normal behavior? Or did I miss-configure something? On Wed, Jul 4, 2012 at 12:17 AM, Suraj Varma <[EMAIL PROTECTED]> wrote: > In your case, likely you are hitting the blocking store files > (hbase.hstore.blockingStoreFiles default:7) and/or > hbase.hregion.memstore.block.multiplier - check out > http://hbase.apache.org/book/config.files.html for more details on > this configurations and how they affect your insert performance. > > On ganglia, also check whether you have a compaction queue spiking > during these timeouts. > --Suraj > > > On Thu, Jun 21, 2012 at 4:27 AM, Martin Alig <[EMAIL PROTECTED]> > wrote: > > Thank you for the suggestions. > > > > So I changed the setup and now have: > > 1 Master running Namenode, SecondaryNamenode, ZK and the HMaster > > 7 Slaves running Datanode and Regionserver > > 2 Clients to insert data > > > > > > What I forgot in my first post, that sometimes the clients even get a > > SocketTimeOutException when inserting the data. (of course during that > time > > 0 inserts are done) > > By looking at the logs, (I also turned on the gc logs) I see the > following: > > > > Multiple consecutive entries like: > > 2012-06-21 11:42:13,962 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Blocking updates for 'IPC Server handler 6 on 60020' on region > > usertable,user600,1340200683555.a45b03dd65a62afa676488921e47dbaa.: > memstore > > size 1.0g is >= than blocking 1.0g size > > > > Shortly after those entries, many entries like: > > 2012-06-21 12:43:53,028 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > > {"processingtimems":35046,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2642a14d > ), > > rpc version=1, client version=29, > methodsFingerPrint=-1508511443","client":" > > 10.110.129.12:54624 > > > ","starttimems":1340275397981,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > > > > Looking at the gc-logs, many entries like: > > 2870.329: [GC 2870.330: [ParNew: 108450K->3401K(118016K), 0.0182570 secs] > > 4184711K->4079843K(12569856K), 0.0183510 secs] [Times: user=0.24 > sys=0.00, > > real=0.01 secs] > > > > But always arround 0.01 secs - 0.04secs. > > > > And also from the gc-log: > > 2696.013: [CMS-concurrent-sweep: 8.999/10.448 secs] [Times: user=46.93 > > sys=2.24, real=10.45 secs] > > > > Is the 10.45 secs too long? > > Or what exactly should I watch out for in the gc logs? > > > > > > I also configured ganglia to have a look at some more metrics. Looking at > > io_wait (which should matter concerning my question to the disks), I can > > observe values between 10 % and 25 % on the regionserver. > > Should that be lower? > > > > Btw. I'm using HBase 0.94 and Hadoop 1.0.3. > > > > > > Thank you again. > > > > > > Martin > > > > > > > > On Wed, Jun 20, 2012 at 7:04 PM, Dave Wang <[EMAIL PROTECTED]> wrote: > > > >> I'd also remove the DN and RS from the node running ZK, NN, etc. as you > >> don't want heavweight processes on that node. > >> > >> - Dave > >> > >> On Wed, Jun 20, 2012 at 9:31 AM, Elliott Clark <[EMAIL PROTECTED] > >> >wrote: > >> > >> > Basically without metrics on what's going on it's tough to know for > sure. > >> > > >> > I would turn on GC logging and make sure that is not playing a part, > get > >> > metrics on IO while this is going on, and look through the logs to see > >> what > >> > is happening when you notice the pause. > >> > > >> > On Wed, Jun 20, 2012 at 6:39 AM, Martin Alig <[EMAIL PROTECTED]> > >> > wrote: |