|
|
-
Meaning of storefileIndexSize
Renaud Delbru 2010-05-17, 14:27
Hi,
I would like to understand the meaning of the storefileIndexSize metric, could someone point me to a definition or explain me what does that mean ?
Also, we are performing a large table import (90M rows, size of the row varying between hundreds of kb to 8 MB), and we are encountering memory problem (OOME). My observation is that it always happens after a while, when the storefileIndexSize starts to be large (> 500). Is there a way to reduce it ?
Thanks, -- Renaud Delbru
+
Renaud Delbru 2010-05-17, 14:27
-
Re: Meaning of storefileIndexSize
Renaud Delbru 2010-05-18, 09:15
Hi,
after some tuning, like increasing the hfile block size to 128KB, I have noticed that the storefileIndexSize is now half of what it was before (~250). Do storefileIndexSize is the size of the in-memory hfile block index ?
Thanks -- Renaud Delbru
On 17/05/10 15:27, Renaud Delbru wrote: > Hi, > > I would like to understand the meaning of the storefileIndexSize > metric, could someone point me to a definition or explain me what does > that mean ? > > Also, we are performing a large table import (90M rows, size of the > row varying between hundreds of kb to 8 MB), and we are encountering > memory problem (OOME). My observation is that it always happens after > a while, when the storefileIndexSize starts to be large (> 500). Is > there a way to reduce it ? > > Thanks,
+
Renaud Delbru 2010-05-18, 09:15
-
Re: Meaning of storefileIndexSize
Stack 2010-05-18, 15:51
On Tue, May 18, 2010 at 2:15 AM, Renaud Delbru <[EMAIL PROTECTED]> wrote: > Hi, > > after some tuning, like increasing the hfile block size to 128KB, I have > noticed that the storefileIndexSize is now half of what it was before > (~250). Do storefileIndexSize is the size of the in-memory hfile block index > ?
Yes.
So, yes, doubling the block size should halve the index size.
How come your index is so big? Do you have big keys? Lots of data? Lots of storefiles?
Looking in HRegionServer I see that its calculated so:
storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024);
In the Store, we do this:
/** * @return The size of the store file indexes, in bytes. */ long getStorefilesIndexSize() { long size = 0; for (StoreFile s: storefiles.values()) { Reader r = s.getReader(); if (r == null) { LOG.warn("StoreFile " + s + " has a null Reader"); continue; } size += r.indexSize(); } return size; }
The indexSize is out of the HFile metadata.
St.Ack
> > Thanks > -- > Renaud Delbru > > On 17/05/10 15:27, Renaud Delbru wrote: >> >> Hi, >> >> I would like to understand the meaning of the storefileIndexSize metric, >> could someone point me to a definition or explain me what does that mean ? >> >> Also, we are performing a large table import (90M rows, size of the row >> varying between hundreds of kb to 8 MB), and we are encountering memory >> problem (OOME). My observation is that it always happens after a while, when >> the storefileIndexSize starts to be large (> 500). Is there a way to reduce >> it ? >> >> Thanks, > >
+
Stack 2010-05-18, 15:51
-
Re: Meaning of storefileIndexSize
Renaud Delbru 2010-05-18, 16:04
Hi Stack,
On 18/05/10 16:51, Stack wrote: >> after some tuning, like increasing the hfile block size to 128KB, I have >> noticed that the storefileIndexSize is now half of what it was before >> (~250). Do storefileIndexSize is the size of the in-memory hfile block index >> ? >> > Yes. > > So, yes, doubling the block size should halve the index size. > > How come your index is so big? Do you have big keys? Lots of data? > Lots of storefiles? > We have 90M of rows, each rows varies from a few hundreds of kilobytes to 8MB.
I have also changed at the same time another parameter, the hbase.hregion.max.filesize. It was set to 1GB (from previous test), and I switch it back to the default value (256MB). So, in the previous tests, there was a few number of region files (like 250), but a very large index file size (>500).
In my last test (hregion.max.filesize=256, block size=128K), the number of region files increased (I have now more than 1000 region file), but the index file size is now less than 200.
Do you think the hregion.max.filesize could had impact on the index file size ?
> Looking in HRegionServer I see that its calculated so: > > storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024); > So, storefileIndexSize indicates the number of MB of heap used by the index. And, in our case, 500 was too excessive given the fact that our region server is limited to 1GB of heap.
Thanks. -- Renaud Delbru
+
Renaud Delbru 2010-05-18, 16:04
-
Re: Meaning of storefileIndexSize
Stack 2010-05-18, 16:31
On Tue, May 18, 2010 at 9:04 AM, Renaud Delbru <[EMAIL PROTECTED]> wrote: >> How come your index is so big? Do you have big keys? Lots of data? >> Lots of storefiles? >> > > We have 90M of rows, each rows varies from a few hundreds of kilobytes to > 8MB. >
Index keeps the 'key' that starts each block in an hfile and its offset where the 'key' is a combination of row+column+timestamp (not the value). Your 'keys' are large?
> I have also changed at the same time another parameter, the > hbase.hregion.max.filesize. It was set to 1GB (from previous test), and I > switch it back to the default value (256MB). > So, in the previous tests, there was a few number of region files (like > 250), but a very large index file size (>500). > > In my last test (hregion.max.filesize=256, block size=128K), the number of > region files increased (I have now more than 1000 region file), but the > index file size is now less than 200. > > Do you think the hregion.max.filesize could had impact on the index file > size ? >
Hmm. You have same amount of "data" just more files because you lowered max filesize (by a factor of 4 so 4x the number of files) so I'd expect that index would be of the same size.
If inclined to do more digging, you can use the hfile tool:
./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile
Do the above and you'll get usage. Print out the metadata on hfiles. Might help you figure whats going on.
>> Looking in HRegionServer I see that its calculated so: >> >> storefileIndexSizeMB = (int)(store.getStorefilesIndexSize()/1024/1024); >> > > So, storefileIndexSize indicates the number of MB of heap used by the index. > And, in our case, 500 was too excessive given the fact that our region > server is limited to 1GB of heap. >
If 1GB only, then yeah, big indices will cause a prob. How many regions per regionserver? Sounds like you have a few? If so, can you add more servers? Or up the RAM in your machines?
Yours, St.Ack
+
Stack 2010-05-18, 16:31
-
Re: Meaning of storefileIndexSize
Renaud Delbru 2010-05-18, 16:45
On 18/05/10 17:31, Stack wrote: > On Tue, May 18, 2010 at 9:04 AM, Renaud Delbru<[EMAIL PROTECTED]> wrote: > >> We have 90M of rows, each rows varies from a few hundreds of kilobytes to >> 8MB > Index keeps the 'key' that starts each block in an hfile and its > offset where the 'key' is a combination of row+column+timestamp (not > the value). Your 'keys' are large? > Our row keys are just plain web document urls. Column name are a few characters. So, I will say fairly small. >> I have also changed at the same time another parameter, the >> hbase.hregion.max.filesize. It was set to 1GB (from previous test), and I >> switch it back to the default value (256MB). >> So, in the previous tests, there was a few number of region files (like >> 250), but a very large index file size (>500). >> >> In my last test (hregion.max.filesize=256, block size=128K), the number of >> region files increased (I have now more than 1000 region file), but the >> index file size is now less than 200. >> >> Do you think the hregion.max.filesize could had impact on the index file >> size ? >> >> > Hmm. You have same amount of "data" just more files because you > lowered max filesize (by a factor of 4 so 4x the number of files) so > I'd expect that index would be of the same size. > Ok, so it is jsut the modification of block size which reduces the index file size. > If inclined to do more digging, you can use the hfile tool: > > ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile > > Do the above and you'll get usage. Print out the metadata on hfiles. > Might help you figure whats going on. > I'll have a look at this. >> So, storefileIndexSize indicates the number of MB of heap used by the index. >> And, in our case, 500 was too excessive given the fact that our region >> server is limited to 1GB of heap > If 1GB only, then yeah, big indices will cause a prob. How many > regions per regionserver? Sounds like you have a few? If so, can you > add more servers? Or up the RAM in your machines? > Yes, we have four nodes, each node has currently 280 region files (approximatively). We are not able to increase the number of nodes or the RAM for the moment. So, our solution was to tune hbase for our setup. But, finally, hbase seems to handle it well. Using the new configuration settings, I was able to import our 90M rows in less than 11 hours (using a map-reduce job on the same cluster), while keeping the used heap of the region servers relatively small (300 to 500MB). Now, the region servers looks stable, with a relatively small heap used, even if I use the hbase table as a map reduce input format. So, it seems that the memory problem was related to the hfile block size. -- Renaud Delbru
+
Renaud Delbru 2010-05-18, 16:45
-
Re: Meaning of storefileIndexSize
Renaud Delbru 2010-05-19, 12:32
Hi Stack,
a last question, is it possible (or will it be possible) to define a limit to the maximum memory used by the store file index (as it is possible for the memory store) ? Because from what I understand, actually, the store file index will grow (linearly ?) with the amount of data stored on hbase. So, my tuning of hfile block size is working for the moment, but if we double the amount of data on hbase, we will encounter again the same problem. -- Renaud Delbru
+
Renaud Delbru 2010-05-19, 12:32
-
Re: Meaning of storefileIndexSize
Jonathan Gray 2010-05-19, 15:08
There is an open jira to add block indexes and bloom filters into the LRU (this would add limits to both). On my phone so don't have the # off hand.
After blooms get committed to trunk I will work on implementing that.
On May 19, 2010, at 5:32 AM, "Renaud Delbru" <[EMAIL PROTECTED]> wrote:
> Hi Stack, > > a last question, > is it possible (or will it be possible) to define a limit to the > maximum > memory used by the store file index (as it is possible for the memory > store) ? Because from what I understand, actually, the store file > index > will grow (linearly ?) with the amount of data stored on hbase. So, my > tuning of hfile block size is working for the moment, but if we double > the amount of data on hbase, we will encounter again the same problem. > -- > Renaud Delbru
+
Jonathan Gray 2010-05-19, 15:08
-
Re: Meaning of storefileIndexSize
Renaud Delbru 2010-05-19, 16:34
Thanks Jonathan, good to know that such feature is on the way.
Cheers -- Renaud Delbru
On 19/05/10 16:08, Jonathan Gray wrote: > There is an open jira to add block indexes and bloom filters into the > LRU (this would add limits to both). On my phone so don't have the # > off hand. > > After blooms get committed to trunk I will work on implementing that. > > On May 19, 2010, at 5:32 AM, "Renaud Delbru"<[EMAIL PROTECTED]> > wrote: > > >> Hi Stack, >> >> a last question, >> is it possible (or will it be possible) to define a limit to the >> maximum >> memory used by the store file index (as it is possible for the memory >> store) ? Because from what I understand, actually, the store file >> index >> will grow (linearly ?) with the amount of data stored on hbase. So, my >> tuning of hfile block size is working for the moment, but if we double >> the amount of data on hbase, we will encounter again the same problem. >> -- >> Renaud Delbru >>
+
Renaud Delbru 2010-05-19, 16:34
-
RE: Meaning of storefileIndexSize
Jonathan Gray 2010-05-19, 17:18
FYI the jira is here: https://issues.apache.org/jira/browse/HBASE-2500> -----Original Message----- > From: Renaud Delbru [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, May 19, 2010 9:34 AM > To: [EMAIL PROTECTED] > Subject: Re: Meaning of storefileIndexSize > > Thanks Jonathan, > good to know that such feature is on the way. > > Cheers > -- > Renaud Delbru > > On 19/05/10 16:08, Jonathan Gray wrote: > > There is an open jira to add block indexes and bloom filters into the > > LRU (this would add limits to both). On my phone so don't have the # > > off hand. > > > > After blooms get committed to trunk I will work on implementing that. > > > > On May 19, 2010, at 5:32 AM, "Renaud Delbru"<[EMAIL PROTECTED]> > > wrote: > > > > > >> Hi Stack, > >> > >> a last question, > >> is it possible (or will it be possible) to define a limit to the > >> maximum > >> memory used by the store file index (as it is possible for the > memory > >> store) ? Because from what I understand, actually, the store file > >> index > >> will grow (linearly ?) with the amount of data stored on hbase. So, > my > >> tuning of hfile block size is working for the moment, but if we > double > >> the amount of data on hbase, we will encounter again the same > problem. > >> -- > >> Renaud Delbru > >>
+
Jonathan Gray 2010-05-19, 17:18
|
|