|
|
-
a question storefileIndexSize
Gaojinchao 2011-05-24, 12:29
My observation is that storefileIndexSize is large. Is there a way to reduce it ?
Region server metric: requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717, memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916, maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976, blockCacheCount=53394, blockCacheHitCount=16229024, blockCacheMissCount=91803814, blockCacheEvictedCount=22381853, blockCacheHitRatio=15, blockCacheHitCachingRatio=41
+
Gaojinchao 2011-05-24, 12:29
-
Re: a question storefileIndexSize
Stack 2011-05-24, 17:00
What Ted says or you could change the hfile block size; currently its 64k. Make it bigger? Do you have big keys and small values? If so, can you make do with smaller keys? That would help with index size too.
St.Ack
On Tue, May 24, 2011 at 5:29 AM, Gaojinchao <[EMAIL PROTECTED]> wrote: > My observation is that storefileIndexSize is large. > Is there a way to reduce it ? > > Region server metric: > requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717, > memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916, > maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976, blockCacheCount=53394, > blockCacheHitCount=16229024, blockCacheMissCount=91803814, blockCacheEvictedCount=22381853, > blockCacheHitRatio=15, blockCacheHitCachingRatio=41 >
+
Stack 2011-05-24, 17:00
-
Re: a question storefileIndexSize
Gaojinchao 2011-05-25, 01:55
Stack, Thanks for your reply. block size is default. My Key length is 26 bytes and value is 300~400 bytes. Is it big keys and small values ? -----邮件原件----- 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 代表 Stack 发送时间: 2011年5月25日 1:01 收件人: [EMAIL PROTECTED] 主题: Re: a question storefileIndexSize
What Ted says or you could change the hfile block size; currently its 64k. Make it bigger? Do you have big keys and small values? If so, can you make do with smaller keys? That would help with index size too.
St.Ack
On Tue, May 24, 2011 at 5:29 AM, Gaojinchao <[EMAIL PROTECTED]> wrote: > My observation is that storefileIndexSize is large. > Is there a way to reduce it ? > > Region server metric: > requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717, > memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916, > maxHeap=8165, blockCacheSize=1394662632, blockCacheFree=317661976, blockCacheCount=53394, > blockCacheHitCount=16229024, blockCacheMissCount=91803814, blockCacheEvictedCount=22381853, > blockCacheHitRatio=15, blockCacheHitCachingRatio=41 >
+
Gaojinchao 2011-05-25, 01:55
-
Re: a question storefileIndexSize
Stack 2011-05-25, 03:57
2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: > Stack, Thanks for your reply. > block size is default. > My Key length is 26 bytes and value is 300~400 bytes. > Is it big keys and small values ? >
Looks like you have 'small' keys.
It looks like the index is about 1MB per storefile (storefiles=3103, storefileIndexSize=3717). Does this seem about right? What size are your regions?
St.Ack
+
Stack 2011-05-25, 03:57
-
Re: a question storefileIndexSize
Gaojinchao 2011-05-25, 07:54
Region size is 512M
hbase.regionserver.handler.count 50 hbase.regionserver.global.memstore.upperLimit 0.4 hbase.regionserver.global.memstore.lowerLimit 0.35 hbase.hregion.memstore.flush.size 128M hbase.hregion.max.filesize 512M hbase.client.scanner.caching 1 hfile.block.cache.size 0.2 hbase.hregion.memstore.block.multiplier 3 hbase.hstore.blockingStoreFiles 10 hbase.hstore.compaction.min.size 64M
compress: gz
dfs.block.size 256M
-----邮件原件----- 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 代表 Stack 发送时间: 2011年5月25日 11:57 收件人: [EMAIL PROTECTED] 主题: Re: a question storefileIndexSize
2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: > Stack, Thanks for your reply. > block size is default. > My Key length is 26 bytes and value is 300~400 bytes. > Is it big keys and small values ? >
Looks like you have 'small' keys.
It looks like the index is about 1MB per storefile (storefiles=3103, storefileIndexSize=3717). Does this seem about right? What size are your regions?
St.Ack
+
Gaojinchao 2011-05-25, 07:54
-
Re: a question storefileIndexSize
Matt Corgan 2011-05-25, 14:47
I have a table that compresses by 30x using gzip, so the default block size of 64 KB was writing 2 KB blocks to disk. To reduce storefileIndexSize, I raised the block size to 256 KB, presumably writing ~8KB disk blocks which is still pretty small. Maybe you could go even higher depending on your compression ratio.
btw - why 10394 regions with only 3103 storefiles? 2011/5/25 Gaojinchao <[EMAIL PROTECTED]>
> Region size is 512M > > hbase.regionserver.handler.count 50 > hbase.regionserver.global.memstore.upperLimit 0.4 > hbase.regionserver.global.memstore.lowerLimit 0.35 > hbase.hregion.memstore.flush.size 128M > hbase.hregion.max.filesize 512M > hbase.client.scanner.caching 1 hfile.block.cache.size 0.2 > hbase.hregion.memstore.block.multiplier 3 > hbase.hstore.blockingStoreFiles 10 > hbase.hstore.compaction.min.size 64M > > compress: gz > > dfs.block.size 256M > > -----邮件原件----- > 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 代表 Stack > 发送时间: 2011年5月25日 11:57 > 收件人: [EMAIL PROTECTED] > 主题: Re: a question storefileIndexSize > > 2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: > > Stack, Thanks for your reply. > > block size is default. > > My Key length is 26 bytes and value is 300~400 bytes. > > Is it big keys and small values ? > > > > Looks like you have 'small' keys. > > It looks like the index is about 1MB per storefile (storefiles=3103, > storefileIndexSize=3717). Does this seem about right? What size are > your regions? > > St.Ack >
+
Matt Corgan 2011-05-25, 14:47
-
Re: a question storefileIndexSize
Matt Corgan 2011-05-25, 15:09
also - how long are your column family name and column qualifiers? they are added to each row key in the index, so you want to make them as short as possible On Wed, May 25, 2011 at 10:47 AM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> I have a table that compresses by 30x using gzip, so the default block size > of 64 KB was writing 2 KB blocks to disk. To reduce storefileIndexSize, I > raised the block size to 256 KB, presumably writing ~8KB disk blocks which > is still pretty small. Maybe you could go even higher depending on your > compression ratio. > > btw - why 10394 regions with only 3103 storefiles? > > > > 2011/5/25 Gaojinchao <[EMAIL PROTECTED]> > >> Region size is 512M >> >> hbase.regionserver.handler.count 50 >> hbase.regionserver.global.memstore.upperLimit 0.4 >> hbase.regionserver.global.memstore.lowerLimit 0.35 >> hbase.hregion.memstore.flush.size 128M >> hbase.hregion.max.filesize 512M >> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2 >> hbase.hregion.memstore.block.multiplier 3 >> hbase.hstore.blockingStoreFiles 10 >> hbase.hstore.compaction.min.size 64M >> >> compress: gz >> >> dfs.block.size 256M >> >> -----邮件原件----- >> 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 代表 Stack >> 发送时间: 2011年5月25日 11:57 >> 收件人: [EMAIL PROTECTED] >> 主题: Re: a question storefileIndexSize >> >> 2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: >> > Stack, Thanks for your reply. >> > block size is default. >> > My Key length is 26 bytes and value is 300~400 bytes. >> > Is it big keys and small values ? >> > >> >> Looks like you have 'small' keys. >> >> It looks like the index is about 1MB per storefile (storefiles=3103, >> storefileIndexSize=3717). Does this seem about right? What size are >> your regions? >> >> St.Ack >> > >
+
Matt Corgan 2011-05-25, 15:09
-
Re: a question storefileIndexSize
Stack 2011-05-25, 21:20
Good point Matt. I forgot about compression. Let me add not to the above referenced section in the book.... St.Ack
On Wed, May 25, 2011 at 7:47 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > I have a table that compresses by 30x using gzip, so the default block size > of 64 KB was writing 2 KB blocks to disk. To reduce storefileIndexSize, I > raised the block size to 256 KB, presumably writing ~8KB disk blocks which > is still pretty small. Maybe you could go even higher depending on your > compression ratio. > > btw - why 10394 regions with only 3103 storefiles? > > > 2011/5/25 Gaojinchao <[EMAIL PROTECTED]> > >> Region size is 512M >> >> hbase.regionserver.handler.count 50 >> hbase.regionserver.global.memstore.upperLimit 0.4 >> hbase.regionserver.global.memstore.lowerLimit 0.35 >> hbase.hregion.memstore.flush.size 128M >> hbase.hregion.max.filesize 512M >> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2 >> hbase.hregion.memstore.block.multiplier 3 >> hbase.hstore.blockingStoreFiles 10 >> hbase.hstore.compaction.min.size 64M >> >> compress: gz >> >> dfs.block.size 256M >> >> -----邮件原件----- >> 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] � ��Stack >> 发送时间: 2011年5月25日 11:57 >> 收件人: [EMAIL PROTECTED] >> 主题: Re: a question storefileIndexSize >> >> 2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: >> > Stack, Thanks for your reply. >> > block size is default. >> > My Key length is 26 bytes and value is 300~400 bytes. >> > Is it big keys and small values ? >> > >> >> Looks like you have 'small' keys. >> >> It looks like the index is about 1MB per storefile (storefiles=3103, >> storefileIndexSize=3717). Does this seem about right? What size are >> your regions? >> >> St.Ack >> >
+
Stack 2011-05-25, 21:20
-
Re: a question storefileIndexSize
Matt Corgan 2011-05-25, 23:49
I was thinking it would be a nice feature if each time an hfile was written it kept a count of the raw bytes (before compression) to make it easy to compare to the file size on disk. It could report it in the web interface next to the disk size. 2011/5/25 Stack <[EMAIL PROTECTED]>
> Good point Matt. I forgot about compression. Let me add not to the > above referenced section in the book.... > St.Ack > > On Wed, May 25, 2011 at 7:47 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > I have a table that compresses by 30x using gzip, so the default block > size > > of 64 KB was writing 2 KB blocks to disk. To reduce storefileIndexSize, > I > > raised the block size to 256 KB, presumably writing ~8KB disk blocks > which > > is still pretty small. Maybe you could go even higher depending on your > > compression ratio. > > > > btw - why 10394 regions with only 3103 storefiles? > > > > > > 2011/5/25 Gaojinchao <[EMAIL PROTECTED]> > > > >> Region size is 512M > >> > >> hbase.regionserver.handler.count 50 > >> hbase.regionserver.global.memstore.upperLimit 0.4 > >> hbase.regionserver.global.memstore.lowerLimit 0.35 > >> hbase.hregion.memstore.flush.size 128M > >> hbase.hregion.max.filesize 512M > >> hbase.client.scanner.caching 1 hfile.block.cache.size 0.2 > >> hbase.hregion.memstore.block.multiplier 3 > >> hbase.hstore.blockingStoreFiles 10 > >> hbase.hstore.compaction.min.size 64M > >> > >> compress: gz > >> > >> dfs.block.size 256M > >> > >> -----邮件原件----- > >> 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] 代表 Stack > >> 发送时间: 2011年5月25�� 11:57 > >> 收件人: [EMAIL PROTECTED] > >> 主题: Re: a question storefileIndexSize > >> > >> 2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: > >> > Stack, Thanks for your reply. > >> > block size is default. > >> > My Key length is 26 bytes and value is 300~400 bytes. > >> > Is it big keys and small values ? > >> > > >> > >> Looks like you have 'small' keys. > >> > >> It looks like the index is about 1MB per storefile (storefiles=3103, > >> storefileIndexSize=3717). Does this seem about right? What size are > >> your regions? > >> > >> St.Ack > >> > > >
+
Matt Corgan 2011-05-25, 23:49
-
Re: a question storefileIndexSize
Stack 2011-05-26, 00:03
On Wed, May 25, 2011 at 4:49 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > I was thinking it would be a nice feature if each time an hfile was written > it kept a count of the raw bytes (before compression) to make it easy to > compare to the file size on disk. It could report it in the web interface > next to the disk size. > >
Its logged IIRC.
Please open an issue Matt to add this facility. St.Ack
+
Stack 2011-05-26, 00:03
-
Re: a question storefileIndexSize
Stack 2011-05-25, 03:59
Oh, I forgot about this suggestion: http://hbase.apache.org/book.html#keysize I mention it because it cites a study done by Marc Limotte where he had a similar relatively big storefile index and he dug in. You might be interested in how he did his research. St.Ack On Tue, May 24, 2011 at 8:57 PM, Stack <[EMAIL PROTECTED]> wrote: > 2011/5/24 Gaojinchao <[EMAIL PROTECTED]>: >> Stack, Thanks for your reply. >> block size is default. >> My Key length is 26 bytes and value is 300~400 bytes. >> Is it big keys and small values ? >> > > Looks like you have 'small' keys. > > It looks like the index is about 1MB per storefile ( storefiles=3103, > storefileIndexSize=3717). Does this seem about right? What size are > your regions? > > St.Ack >
+
Stack 2011-05-25, 03:59
-
Re: a question storefileIndexSize
Ted Yu 2011-05-24, 15:58
See https://issues.apache.org/jira/browse/HBASE-3857 and https://issues.apache.org/jira/browse/HBASE-3856Cheers On Tue, May 24, 2011 at 5:29 AM, Gaojinchao <[EMAIL PROTECTED]> wrote: > My observation is that storefileIndexSize is large. > Is there a way to reduce it ? > > Region server metric: > requests=11447, regions=10394, stores=10394, storefiles=3103, > storefileIndexSize=3717, > memstore Size=1002, compactionQueue Size=1234, flushQueue Size=0, > usedHeap=6916, > maxHeap=8165, blockCache Size=1394662632, blockCacheFree=317661976, > blockCacheCount=53394, > blockCacheHitCount=16229024, blockCacheMissCount=91803814, > blockCacheEvictedCount=22381853, > blockCacheHitRatio=15, blockCacheHitCachingRatio=41 >
+
Ted Yu 2011-05-24, 15:58
|
|