|
|
-
MemStoreFlusher: region has too many store files (client timeout)
kzurek 2013-02-04, 12:45
I'm facing some issues regarding to major compaction. I've disabled major compaction and it is not triggered manually, but when I'm loading data to selected region, I saw that major compaction queue is growing and it is being triggered ('Large Compaction' in logs) quite often (mainly due to cacheFlusher). Moreover, I've noticed that my clients app gets timeout while putting data into the cluster (happens when memory store flusher is trying to dump memory content to store file, but it cannot due to too many store files), also drop in data rate, which in this case is obvious, is noticeable. For me, it looks like compaction process is not fast enough comparing to incoming rate of data or ... maybe something else?? and finally is blocking the update process.
Basic information: HBase Version: 0.92.1, r1298924 Hadoop Version: 1.0.3, r1335192
Logs: 2013-02-01 15:43:14,627 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction requested: regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m), priority=-3, time=1051078047102762; Because: regionserver60020.cacheFlusher; compaction_queue=(48:0), split_queue=0 2013-02-01 15:43:14,627 DEBUG org.apache.hadoop.hbase.regionserver.Store: 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081), rpc version=1, client version=29, methodsFingerPrint=54742778","client":"192.168.1.68:49419","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@57a56081), rpc version=1, client version=29, methodsFingerPrint=54742778 from 192.168.1.68:49419: output error 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) at org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346)
2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. 2013-02-01 15:43:19,397 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. has too many store files; delaying flush up to 90000ms 2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.Store: 3b710693d6314c2a987b07dd82451158 - data: no store files to compact 2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.Store: 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact 2013-02-01 15:43:55,693 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 10 on 60020' on region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158.: memstore size 1.5g is >= than blocking 1.5g size 2013-02-01 15:44:49,452 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90055ms on a compaction to clean up 'too many store files'; waited long enough... proceeding with flush of test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. 2013-02-01 15:44:49,452 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., current region memstore size 1.5g 2013-02-01 15:44:49,453 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., commencing wait for mvcc, flushsize=1624094576 2013-02-01 15:44:50,014 DEBUG org.apache.hadoop.hbase.io.hfile.HFileWriterV2: Initialized with CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false] 2013-02-01 15:44:55,082 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/.tmp/7ddee02376664ac387eb3c786c541ed0 to hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0 2013-02-01 15:44:55,086 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://SOME-PC:9000/hbase/test/3b710693d6314c2a987b07dd82451158/data/7ddee02376664ac387eb3c786c541ed0, entries=7157100, sequenceid=58642, memsize=1.5g, filesize=113.8m 2013-02-01 15:44:55,087 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. 'IPC Server handler 10 on 60020' 2013-02-01 15:44:55,087 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~1.5g/1624094576, currentsize=0.0/0 for region test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. in 5635ms, sequenceid=58642, compaction requested=false
+
kzurek 2013-02-04, 12:45
-
Re: MemStoreFlusher: region has too many store files (client timeout)
Kevin O'dell 2013-02-04, 14:46
Kzurek,
Just because you turn off time based major compactions, it does not mean that you have turned major compaction off. Compaction can still be promoted to be Majors. Also, the only real difference between a major and minor compaction is one processes deletes. You should really schedule at least daily major compactions. As for your blocking issue, there are quite a few things you would want to look at:
How often are you flushing?
How often are you force flushing from HLog rolls?
What size are your flushes?
What does your region count look like as that can affect your flush size?
etc
When I see HBase blocking constantly it is usually a sign that you need to do some fine grain tuning.
On Mon, Feb 4, 2013 at 7:45 AM, kzurek <[EMAIL PROTECTED]> wrote:
> I'm facing some issues regarding to major compaction. I've disabled major > compaction and it is not triggered manually, but when I'm loading data to > selected region, I saw that major compaction queue is growing and it is > being triggered ('Large Compaction' in logs) quite often (mainly due to > cacheFlusher). Moreover, I've noticed that my clients app gets timeout > while > putting data into the cluster (happens when memory store flusher is trying > to dump memory content to store file, but it cannot due to too many store > files), also drop in data rate, which in this case is obvious, is > noticeable. For me, it looks like compaction process is not fast enough > comparing to incoming rate of data or ... maybe something else?? and > finally > is blocking the update process. > > Basic information: > HBase Version: 0.92.1, r1298924 > Hadoop Version: 1.0.3, r1335192 > > Logs: > 2013-02-01 15:43:14,627 DEBUG > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction > requested: > > regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., > storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m), > priority=-3, time=1051078047102762; Because: > regionserver60020.cacheFlusher; > compaction_queue=(48:0), split_queue=0 > 2013-02-01 15:43:14,627 DEBUG org.apache.hadoop.hbase.regionserver.Store: > 3b710693d6314c2a987b07dd82451158 - tag: no store files to compact > 2013-02-01 15:43:14,709 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > > {"processingtimems":61908,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@57a56081 > ), > rpc version=1, client version=29, > methodsFingerPrint=54742778","client":"192.168.1.68:49419 > ","starttimems":1359729732799,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server > Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@57a56081 > ), > rpc version=1, client version=29, methodsFingerPrint=54742778 from > 192.168.1.68:49419: output error > 2013-02-01 15:43:14,710 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 3 on 60020 caught: java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1710) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1653) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:924) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1003) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:409) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1346) > > 2013-02-01 15:43:19,397 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Flush requested on > > test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158. > 2013-02-01 15:43:19,397 WARN
Kevin O'Dell Customer Operations Engineer, Cloudera
+
Kevin O'dell 2013-02-04, 14:46
-
Re: MemStoreFlusher: region has too many store files (client timeout)
kzurek 2013-02-05, 08:39
Thanks for the reply, although I should clear some misunderstandings. In general, I do know general behavior and difference between minor and major compaction, as well as when minor compaction might become (could be promoted) to major compaction. I just wanted to verify influence of compaction (mostly major) on our cluster performance. Thus, I've created test where I'm putting data to only one region (total 71) by one single threaded client using build in caching mechanism (according to "HBase The Definitive Guide" book) and triggering major compaction by hand (HBaseAdmin). Although, after few tests I've noticed that major compaction (Large Compaction) is being triggered (cache flusher, recursive queue) so I left it as it was (not triggering it anymore). That brought me to this situation, where I'm putting data and after a while I'm getting timeouts on the client, in meanwhile I see that memstore is being flush which cant create new store file (cause there are to many of them) and which is frequently blocked by compaction process. I hope that this short description will bring closer look at the issue. In addition, here are some answers to your questions: 1. How often are you flushing? I'm not triggering flushing by hand, but I've noticed that data is being flushed every 4s (275m) or 1m 30s-40s (1.5g). 2. How often are you force flushing from HLog rolls? Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true, optionallogflushinternal=1000ms. It seems that roll is made every hour. 3. What size are your flushes? Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m and memstore block multiplier to 6. Should I increase the flush size?? 4. What does your region count look like as that can affect your flush size? Initial split is 37 regions on 6 RegionServers, but at the moment there are 71 regions. Kevin O'dell wrote > Kzurek, > > Just because you turn off time based major compactions, it does not mean > that you have turned major compaction off. Compaction can still be > promoted to be Majors. Also, the only real difference between a major and > minor compaction is one processes deletes. You should really schedule at > least daily major compactions. As for your blocking issue, there are > quite > a few things you would want to look at: > > How often are you flushing? > > How often are you force flushing from HLog rolls? > > What size are your flushes? > > What does your region count look like as that can affect your flush size? > > etc > > When I see HBase blocking constantly it is usually a sign that you need to > do some fine grain tuning. > > On Mon, Feb 4, 2013 at 7:45 AM, kzurek < > kzurek@ > > wrote: > >> I'm facing some issues regarding to major compaction. I've disabled major >> compaction and it is not triggered manually, but when I'm loading data to >> selected region, I saw that major compaction queue is growing and it is >> being triggered ('Large Compaction' in logs) quite often (mainly due to >> cacheFlusher). Moreover, I've noticed that my clients app gets timeout >> while >> putting data into the cluster (happens when memory store flusher is >> trying >> to dump memory content to store file, but it cannot due to too many store >> files), also drop in data rate, which in this case is obvious, is >> noticeable. For me, it looks like compaction process is not fast enough >> comparing to incoming rate of data or ... maybe something else?? and >> finally >> is blocking the update process. >> >> Basic information: >> HBase Version: 0.92.1, r1298924 >> Hadoop Version: 1.0.3, r1335192 >> >> Logs: >> 2013-02-01 15:43:14,627 DEBUG >> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Large Compaction >> requested: >> >> regionName=test,\x00\x00\x00\x00~i\x91\x00\x00\x00\x0D,1359115210217.3b710693d6314c2a987b07dd82451158., >> storeName=data, fileCount=3, fileSize=478.3m (249.8m, 113.7m, 114.7m), >> priority=-3, time=1051078047102762; Because: >> regionserver60020.cacheFlusher; View this message in context: http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037949.htmlSent from the HBase User mailing list archive at Nabble.com.
+
kzurek 2013-02-05, 08:39
-
Re: MemStoreFlusher: region has too many store files (client timeout)
Kevin O'dell 2013-02-05, 14:20
kzurek, Okay, that is a fair flush size(I have not seen larger than 256MB). Do you think you could send me a RS log from yesterday while you were seeing this issue? I would just like to thumb through it and make some tuning recommendations. I have a pretty good idea of what you need to do, if you look at: http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html The article has some good ideas for write tuning. On Tue, Feb 5, 2013 at 3:39 AM, kzurek <[EMAIL PROTECTED]> wrote: > Thanks for the reply, although I should clear some misunderstandings. In > general, I do know general behavior and difference between minor and major > compaction, as well as when minor compaction might become (could be > promoted) to major compaction. I just wanted to verify influence of > compaction (mostly major) on our cluster performance. Thus, I've created > test where I'm putting data to only one region (total 71) by one single > threaded client using build in caching mechanism (according to "HBase The > Definitive Guide" book) and triggering major compaction by hand > (HBaseAdmin). Although, after few tests I've noticed that major compaction > (Large Compaction) is being triggered (cache flusher, recursive queue) so I > left it as it was (not triggering it anymore). That brought me to this > situation, where I'm putting data and after a while I'm getting timeouts on > the client, in meanwhile I see that memstore is being flush which cant > create new store file (cause there are to many of them) and which is > frequently blocked by compaction process. I hope that this short > description > will bring closer look at the issue. In addition, here are some answers to > your questions: > > 1. How often are you flushing? > I'm not triggering flushing by hand, but I've noticed that data is being > flushed every 4s (275m) or 1m 30s-40s (1.5g). > 2. How often are you force flushing from HLog rolls? > Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true, > optionallogflushinternal=1000ms. It seems that roll is made every hour. > 3. What size are your flushes? > Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m and > memstore block multiplier to 6. Should I increase the flush size?? > 4. What does your region count look like as that can affect your flush > size? > Initial split is 37 regions on 6 RegionServers, but at the moment there are > 71 regions. > > > > Kevin O'dell wrote > > Kzurek, > > > > Just because you turn off time based major compactions, it does not mean > > that you have turned major compaction off. Compaction can still be > > promoted to be Majors. Also, the only real difference between a major > and > > minor compaction is one processes deletes. You should really schedule at > > least daily major compactions. As for your blocking issue, there are > > quite > > a few things you would want to look at: > > > > How often are you flushing? > > > > How often are you force flushing from HLog rolls? > > > > What size are your flushes? > > > > What does your region count look like as that can affect your flush size? > > > > etc > > > > When I see HBase blocking constantly it is usually a sign that you need > to > > do some fine grain tuning. > > > > On Mon, Feb 4, 2013 at 7:45 AM, kzurek < > > > kzurek@ > > > > wrote: > > > >> I'm facing some issues regarding to major compaction. I've disabled > major > >> compaction and it is not triggered manually, but when I'm loading data > to > >> selected region, I saw that major compaction queue is growing and it is > >> being triggered ('Large Compaction' in logs) quite often (mainly due to > >> cacheFlusher). Moreover, I've noticed that my clients app gets timeout > >> while > >> putting data into the cluster (happens when memory store flusher is > >> trying > >> to dump memory content to store file, but it cannot due to too many > store > >> files), also drop in data rate, which in this case is obvious, is > >> noticeable. For me, it looks like compaction process is not fast enough Kevin O'Dell Customer Operations Engineer, Cloudera
+
Kevin O'dell 2013-02-05, 14:20
-
Re: MemStoreFlusher: region has too many store files (client timeout)
kzurek 2013-02-05, 14:52
Thanks, but I've already read that article and I was trying to set up settings accordingly. But any type of help could be useful. Here are logs (compressed with tar gzip): hbase_logs.tar.gz < http://apache-hbase.679495.n3.nabble.com/file/n4037960/hbase_logs.tar.gz> Kevin O'dell wrote > kzurek, > > Okay, that is a fair flush size(I have not seen larger than 256MB). Do > you think you could send me a RS log from yesterday while you were seeing > this issue? I would just like to thumb through it and make some tuning > recommendations. I have a pretty good idea of what you need to do, if you > look at: > > http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html The > article has some good ideas for write tuning. > > On Tue, Feb 5, 2013 at 3:39 AM, kzurek < > kzurek@ > > wrote: > >> Thanks for the reply, although I should clear some misunderstandings. In >> general, I do know general behavior and difference between minor and >> major >> compaction, as well as when minor compaction might become (could be >> promoted) to major compaction. I just wanted to verify influence of >> compaction (mostly major) on our cluster performance. Thus, I've created >> test where I'm putting data to only one region (total 71) by one single >> threaded client using build in caching mechanism (according to "HBase The >> Definitive Guide" book) and triggering major compaction by hand >> (HBaseAdmin). Although, after few tests I've noticed that major >> compaction >> (Large Compaction) is being triggered (cache flusher, recursive queue) so >> I >> left it as it was (not triggering it anymore). That brought me to this >> situation, where I'm putting data and after a while I'm getting timeouts >> on >> the client, in meanwhile I see that memstore is being flush which cant >> create new store file (cause there are to many of them) and which is >> frequently blocked by compaction process. I hope that this short >> description >> will bring closer look at the issue. In addition, here are some answers >> to >> your questions: >> >> 1. How often are you flushing? >> I'm not triggering flushing by hand, but I've noticed that data is being >> flushed every 4s (275m) or 1m 30s-40s (1.5g). >> 2. How often are you force flushing from HLog rolls? >> Default settings are: blocksize=64 MB, rollsize=60.8 MB, enabled=true, >> optionallogflushinternal=1000ms. It seems that roll is made every hour. >> 3. What size are your flushes? >> Depends, from 275m up to 1.5g. I've set my memstore flush size to 256m >> and >> memstore block multiplier to 6. Should I increase the flush size?? >> 4. What does your region count look like as that can affect your flush >> size? >> Initial split is 37 regions on 6 RegionServers, but at the moment there >> are >> 71 regions. >> >> >> >> Kevin O'dell wrote >> > Kzurek, >> > >> > Just because you turn off time based major compactions, it does not >> mean >> > that you have turned major compaction off. Compaction can still be >> > promoted to be Majors. Also, the only real difference between a major >> and >> > minor compaction is one processes deletes. You should really schedule >> at >> > least daily major compactions. As for your blocking issue, there are >> > quite >> > a few things you would want to look at: >> > >> > How often are you flushing? >> > >> > How often are you force flushing from HLog rolls? >> > >> > What size are your flushes? >> > >> > What does your region count look like as that can affect your flush >> size? >> > >> > etc >> > >> > When I see HBase blocking constantly it is usually a sign that you need >> to >> > do some fine grain tuning. >> > >> > On Mon, Feb 4, 2013 at 7:45 AM, kzurek < >> >> > kzurek@ >> >> > > wrote: >> > >> >> I'm facing some issues regarding to major compaction. I've disabled >> major >> >> compaction and it is not triggered manually, but when I'm loading data >> to >> >> selected region, I saw that major compaction queue is growing and it >> is >> >> being triggered ('Large Compaction' in logs) quite often (mainly due View this message in context: http://apache-hbase.679495.n3.nabble.com/MemStoreFlusher-region-has-too-many-store-files-client-timeout-tp4037887p4037960.htmlSent from the HBase User mailing list archive at Nabble.com.
+
kzurek 2013-02-05, 14:52
|
|