|
Jean-Marc Spaggiari
2013-01-22, 11:42
Anoop Sam John
2013-01-22, 12:24
ramkrishna vasudevan
2013-01-22, 13:38
Jean-Marc Spaggiari
2013-01-22, 13:47
ramkrishna vasudevan
2013-01-22, 14:02
Jean-Marc Spaggiari
2013-01-22, 14:10
Jean-Marc Spaggiari
2013-01-23, 02:39
Anoop Sam John
2013-01-23, 06:17
Jean-Marc Spaggiari
2013-01-23, 12:26
ramkrishna vasudevan
2013-01-23, 18:09
Jean-Marc Spaggiari
2013-01-23, 18:24
|
-
HBase split policyJean-Marc Spaggiari 2013-01-22, 11:42
Hi,
I'm wondering, what is HBase split policy. I mean, let's imagine this situation. I have a region full of rows starting from AA to AZ. Thousands of hundreds. I also have few rows from B to DZ. Let's say only one hundred. Region is just above the maxfilesize, so it's fine. No, I add "A" and store a very big row into it. Almost half the size of my maxfilesize value. That mean it's now time to split this row. How will HBase decide where to split it? Is it going to use the lexical order? Which mean it will split somewhere between B and C? If it's done that way, I will have one VERY small region, and one VERY big which will still be over the maxfilesize and will need to be split again, and most probably many times, right? Or will HBase take the middle of the region, look at the closest key, and cut there? Yesterday, for one table, I merged all my regions into a single one. This gave me something like a 10GB region. Since I want to have at least 100 regions for this table, I have setup the maxfilesize to 100MB. I have restarted HBase, and let it worked over night. This morning, I have some very big regions, still over the 100MB, and some very small. And the big regions are at least hundred times bigger than the small one. I just stopped the cluster again to re-merge the regions into a single one and see if I have not done something wrong in the process, but in the meantime, I'm looking for more information about the way HBase is deciding where to cut, and if there is a way to customize that. Thanks, JM PS: Numbers are out of my head. I don't really recall how big the last region was yesterday. I will take more notes when the current MassMerge will be done. +
Jean-Marc Spaggiari 2013-01-22, 11:42
-
RE: HBase split policyAnoop Sam John 2013-01-22, 12:24
Jean good topic.
When a region splits it is the HFile(s) split happening. You know HFile logically split into "n" HFileBlocks and we will be having index meta data for these blocks at every HFile level. HBase will find the midkey from these block index data. It will take the mid block as the split point. So it all depends on how the data is spread across different HFileBlocks. So when you split a region [a,e) it need not be split at point "c". It all depends on how many data you have corresponding to each rowkey patterns. One more thing to remember that some time there can be really big HFileBlocks . Even though the default size for a block is 64K some times it can be much larger than this. One row can not be split into 2 or more blocks. It needs to be in one block. So it can so happen that when a split happens bigger blocks going to one daughter making that region as still big !!... [When one row is really huge comparing to others] Some thoughts on the topic as per my limited knowledge on the code. ... -Anoop- ________________________________________ From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] Sent: Tuesday, January 22, 2013 5:12 PM To: user Subject: HBase split policy Hi, I'm wondering, what is HBase split policy. I mean, let's imagine this situation. I have a region full of rows starting from AA to AZ. Thousands of hundreds. I also have few rows from B to DZ. Let's say only one hundred. Region is just above the maxfilesize, so it's fine. No, I add "A" and store a very big row into it. Almost half the size of my maxfilesize value. That mean it's now time to split this row. How will HBase decide where to split it? Is it going to use the lexical order? Which mean it will split somewhere between B and C? If it's done that way, I will have one VERY small region, and one VERY big which will still be over the maxfilesize and will need to be split again, and most probably many times, right? Or will HBase take the middle of the region, look at the closest key, and cut there? Yesterday, for one table, I merged all my regions into a single one. This gave me something like a 10GB region. Since I want to have at least 100 regions for this table, I have setup the maxfilesize to 100MB. I have restarted HBase, and let it worked over night. This morning, I have some very big regions, still over the 100MB, and some very small. And the big regions are at least hundred times bigger than the small one. I just stopped the cluster again to re-merge the regions into a single one and see if I have not done something wrong in the process, but in the meantime, I'm looking for more information about the way HBase is deciding where to cut, and if there is a way to customize that. Thanks, JM PS: Numbers are out of my head. I don't really recall how big the last region was yesterday. I will take more notes when the current MassMerge will be done. +
Anoop Sam John 2013-01-22, 12:24
-
Re: HBase split policyramkrishna vasudevan 2013-01-22, 13:38
Hi Jean
Before replying as to what i know, region splits can be configured too. Ok, now on how the split happens -> You can explicity ask the region to get splitted on a specific row key. If you know that splitting on that rowkey will yield you almost equal region sizes. -> Now when HBase tries to split, it just takes the midkey from the HFiles. Here the midkey is the one that is the first key in the mid block of the HFile. Also the individual rows cannot be split. So if one row is nearly the size of the region and other rows are smaller in size, it tries to find the mid block inside the HFile and the size of one the block is going to be very huge and that may be splitted as one region. I know this has to do with the internals of the splitting code. Regards Ram On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi, > > I'm wondering, what is HBase split policy. > > I mean, let's imagine this situation. > > I have a region full of rows starting from AA to AZ. Thousands of > hundreds. I also have few rows from B to DZ. Let's say only one > hundred. > > Region is just above the maxfilesize, so it's fine. > > No, I add "A" and store a very big row into it. Almost half the size > of my maxfilesize value. That mean it's now time to split this row. > > How will HBase decide where to split it? Is it going to use the > lexical order? Which mean it will split somewhere between B and C? If > it's done that way, I will have one VERY small region, and one VERY > big which will still be over the maxfilesize and will need to be split > again, and most probably many times, right? > > Or will HBase take the middle of the region, look at the closest key, > and cut there? > > Yesterday, for one table, I merged all my regions into a single one. > This gave me something like a 10GB region. Since I want to have at > least 100 regions for this table, I have setup the maxfilesize to > 100MB. I have restarted HBase, and let it worked over night. > > This morning, I have some very big regions, still over the 100MB, and > some very small. And the big regions are at least hundred times bigger > than the small one. > > I just stopped the cluster again to re-merge the regions into a single > one and see if I have not done something wrong in the process, but in > the meantime, I'm looking for more information about the way HBase is > deciding where to cut, and if there is a way to customize that. > > Thanks, > > JM > > PS: Numbers are out of my head. I don't really recall how big the last > region was yesterday. I will take more notes when the current > MassMerge will be done. > +
ramkrishna vasudevan 2013-01-22, 13:38
-
Re: HBase split policyJean-Marc Spaggiari 2013-01-22, 13:47
Hi Anoop, Hi Ram,
Thanks for your replies. I looked at the code and found in the HFileBlockIndex the midkey function which is doing the computation used in the Store.getSplitPoint() method. Now, if all the keys are almost equals in size, and the table has only one big 10GB region, if we lower the maxfilesize parameter to something like 300MB, we should see only almost equals regions, right? It's not the result I got. So I'm trying to figure where I'm wrong. Also, last thing. If I want to change the default behaviour and split based on the row number instead of the midkey, can I hook somewhere? Or will I have to disable the default split (by setting the maxfilesize to something like 20GB) and run a job to split the regions manually? Thanks, JM 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>: > Hi Jean > > Before replying as to what i know, region splits can be configured too. > > Ok, now on how the split happens > -> You can explicity ask the region to get splitted on a specific row key. > If you know that splitting on that rowkey will yield you almost equal > region sizes. > -> Now when HBase tries to split, it just takes the midkey from the HFiles. > Here the midkey is the one that is the first key in the mid block of the > HFile. > Also the individual rows cannot be split. So if one row is nearly the size > of the region and other rows are smaller in size, it tries to find the mid > block inside the HFile and the size of one the block is going to be very > huge and that may be splitted as one region. I know this has to do with > the internals of the splitting code. > > > Regards > Ram > > On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi, >> >> I'm wondering, what is HBase split policy. >> >> I mean, let's imagine this situation. >> >> I have a region full of rows starting from AA to AZ. Thousands of >> hundreds. I also have few rows from B to DZ. Let's say only one >> hundred. >> >> Region is just above the maxfilesize, so it's fine. >> >> No, I add "A" and store a very big row into it. Almost half the size >> of my maxfilesize value. That mean it's now time to split this row. >> >> How will HBase decide where to split it? Is it going to use the >> lexical order? Which mean it will split somewhere between B and C? If >> it's done that way, I will have one VERY small region, and one VERY >> big which will still be over the maxfilesize and will need to be split >> again, and most probably many times, right? >> >> Or will HBase take the middle of the region, look at the closest key, >> and cut there? >> >> Yesterday, for one table, I merged all my regions into a single one. >> This gave me something like a 10GB region. Since I want to have at >> least 100 regions for this table, I have setup the maxfilesize to >> 100MB. I have restarted HBase, and let it worked over night. >> >> This morning, I have some very big regions, still over the 100MB, and >> some very small. And the big regions are at least hundred times bigger >> than the small one. >> >> I just stopped the cluster again to re-merge the regions into a single >> one and see if I have not done something wrong in the process, but in >> the meantime, I'm looking for more information about the way HBase is >> deciding where to cut, and if there is a way to customize that. >> >> Thanks, >> >> JM >> >> PS: Numbers are out of my head. I don't really recall how big the last >> region was yesterday. I will take more notes when the current >> MassMerge will be done. >> > +
Jean-Marc Spaggiari 2013-01-22, 13:47
-
Re: HBase split policyramkrishna vasudevan 2013-01-22, 14:02
>>Also, last thing. If I want to change the default behaviour and split
>>based on the row number instead of the midkey, can I hook somewhere? HTableDescriptor myHtd = new HTableDescriptor(); myHtd.setValue(HTableDescriptor.SPLIT_POLICY, KeyPrefixRegionSplitPolicy.class.getName()); So the region split policy can be changed only during table creation i suppose. (May be wrong, not sure anyother way out there). When i meant split based on row key my point was like use admin.split(rowkey). I will check more on your calculations and figures and get back to you. Regards Ram On Tue, Jan 22, 2013 at 7:17 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Anoop, Hi Ram, > > Thanks for your replies. > > I looked at the code and found in the HFileBlockIndex the midkey > function which is doing the computation used in the > Store.getSplitPoint() method. > > Now, if all the keys are almost equals in size, and the table has only > one big 10GB region, if we lower the maxfilesize parameter to > something like 300MB, we should see only almost equals regions, right? > It's not the result I got. So I'm trying to figure where I'm wrong. > > Also, last thing. If I want to change the default behaviour and split > based on the row number instead of the midkey, can I hook somewhere? > > Or will I have to disable the default split (by setting the > maxfilesize to something like 20GB) and run a job to split the regions > manually? > > Thanks, > > JM > > 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>: > > Hi Jean > > > > Before replying as to what i know, region splits can be configured too. > > > > Ok, now on how the split happens > > -> You can explicity ask the region to get splitted on a specific row > key. > > If you know that splitting on that rowkey will yield you almost equal > > region sizes. > > -> Now when HBase tries to split, it just takes the midkey from the > HFiles. > > Here the midkey is the one that is the first key in the mid block of the > > HFile. > > Also the individual rows cannot be split. So if one row is nearly the > size > > of the region and other rows are smaller in size, it tries to find the > mid > > block inside the HFile and the size of one the block is going to be very > > huge and that may be splitted as one region. I know this has to do with > > the internals of the splitting code. > > > > > > Regards > > Ram > > > > On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> I'm wondering, what is HBase split policy. > >> > >> I mean, let's imagine this situation. > >> > >> I have a region full of rows starting from AA to AZ. Thousands of > >> hundreds. I also have few rows from B to DZ. Let's say only one > >> hundred. > >> > >> Region is just above the maxfilesize, so it's fine. > >> > >> No, I add "A" and store a very big row into it. Almost half the size > >> of my maxfilesize value. That mean it's now time to split this row. > >> > >> How will HBase decide where to split it? Is it going to use the > >> lexical order? Which mean it will split somewhere between B and C? If > >> it's done that way, I will have one VERY small region, and one VERY > >> big which will still be over the maxfilesize and will need to be split > >> again, and most probably many times, right? > >> > >> Or will HBase take the middle of the region, look at the closest key, > >> and cut there? > >> > >> Yesterday, for one table, I merged all my regions into a single one. > >> This gave me something like a 10GB region. Since I want to have at > >> least 100 regions for this table, I have setup the maxfilesize to > >> 100MB. I have restarted HBase, and let it worked over night. > >> > >> This morning, I have some very big regions, still over the 100MB, and > >> some very small. And the big regions are at least hundred times bigger > >> than the small one. > >> > >> I just stopped the cluster again to re-merge the regions into a single +
ramkrishna vasudevan 2013-01-22, 14:02
-
Re: HBase split policyJean-Marc Spaggiari 2013-01-22, 14:10
Hi Ram,
I SPLIT_POLICY is define the same way MAX_FILESIZE is.... So I think it's a table attribut and can be altered... That's a good news! I will probably try it. Also, the admin.split(rowkey) is the way I will use until I'm able to properly use/set the SPLIT_POLICY. I will simply (try to) count the rows in a region, and split in the middle... Thanks for the hint regarding the SPLIT_POLICY. JM 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>: >>>Also, last thing. If I want to change the default behaviour and split >>>based on the row number instead of the midkey, can I hook somewhere? > > HTableDescriptor myHtd = new HTableDescriptor(); > myHtd.setValue(HTableDescriptor.SPLIT_POLICY, > KeyPrefixRegionSplitPolicy.class.getName()); > So the region split policy can be changed only during table creation i > suppose. (May be wrong, not sure anyother way out there). > > When i meant split based on row key my point was like use > admin.split(rowkey). I will check more on your calculations and figures > and get back to you. > > Regards > Ram > > > On Tue, Jan 22, 2013 at 7:17 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Anoop, Hi Ram, >> >> Thanks for your replies. >> >> I looked at the code and found in the HFileBlockIndex the midkey >> function which is doing the computation used in the >> Store.getSplitPoint() method. >> >> Now, if all the keys are almost equals in size, and the table has only >> one big 10GB region, if we lower the maxfilesize parameter to >> something like 300MB, we should see only almost equals regions, right? >> It's not the result I got. So I'm trying to figure where I'm wrong. >> >> Also, last thing. If I want to change the default behaviour and split >> based on the row number instead of the midkey, can I hook somewhere? >> > > >> Or will I have to disable the default split (by setting the >> maxfilesize to something like 20GB) and run a job to split the regions >> manually? >> >> Thanks, >> >> JM >> >> 2013/1/22, ramkrishna vasudevan <[EMAIL PROTECTED]>: >> > Hi Jean >> > >> > Before replying as to what i know, region splits can be configured too. >> > >> > Ok, now on how the split happens >> > -> You can explicity ask the region to get splitted on a specific row >> key. >> > If you know that splitting on that rowkey will yield you almost equal >> > region sizes. >> > -> Now when HBase tries to split, it just takes the midkey from the >> HFiles. >> > Here the midkey is the one that is the first key in the mid block of >> > the >> > HFile. >> > Also the individual rows cannot be split. So if one row is nearly the >> size >> > of the region and other rows are smaller in size, it tries to find the >> mid >> > block inside the HFile and the size of one the block is going to be >> > very >> > huge and that may be splitted as one region. I know this has to do >> > with >> > the internals of the splitting code. >> > >> > >> > Regards >> > Ram >> > >> > On Tue, Jan 22, 2013 at 5:12 PM, Jean-Marc Spaggiari < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Hi, >> >> >> >> I'm wondering, what is HBase split policy. >> >> >> >> I mean, let's imagine this situation. >> >> >> >> I have a region full of rows starting from AA to AZ. Thousands of >> >> hundreds. I also have few rows from B to DZ. Let's say only one >> >> hundred. >> >> >> >> Region is just above the maxfilesize, so it's fine. >> >> >> >> No, I add "A" and store a very big row into it. Almost half the size >> >> of my maxfilesize value. That mean it's now time to split this row. >> >> >> >> How will HBase decide where to split it? Is it going to use the >> >> lexical order? Which mean it will split somewhere between B and C? If >> >> it's done that way, I will have one VERY small region, and one VERY >> >> big which will still be over the maxfilesize and will need to be split >> >> again, and most probably many times, right? >> >> >> >> Or will HBase take the middle of the region, look at the closest key, +
Jean-Marc Spaggiari 2013-01-22, 14:10
-
Re: HBase split policyJean-Marc Spaggiari 2013-01-23, 02:39
Another related question.
What will trigger the split? I mean, I merge all the regions in a single one, split that in 4 2.5GB regions, alter it to set maxsize to 300MB and enable the table. I don't do anything. No put, no get. What will trigger the regions split? I have one small table, about 1.2GB with 8M lines. I merged it in a single region, and setup the maxsize to the 12MB. It got almost split... All the regions got split except one. Here is the screenshot: http://imageshack.us/photo/my-images/834/hannibalb.png/ It's not the first region, not the last. There is nothing specific with this region, and it's not getting split. Any idea why, and how I can trigger the split without putting any data into the date? Thanks, JM +
Jean-Marc Spaggiari 2013-01-23, 02:39
-
RE: HBase split policyAnoop Sam John 2013-01-23, 06:17
>What will trigger the split?
The things which can trigger a split 1. Explicit split call from the client side using admin API 2. A memstore flush 3. A compaction So even though there is no write operations happening on the region (no flushes) still a compaction performed for that region can trigger split. May be in your case compaction happened for some of the regions and resulted in split... -Anoop- ________________________________________ From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] Sent: Wednesday, January 23, 2013 8:09 AM To: [EMAIL PROTECTED] Subject: Re: HBase split policy Another related question. What will trigger the split? I mean, I merge all the regions in a single one, split that in 4 2.5GB regions, alter it to set maxsize to 300MB and enable the table. I don't do anything. No put, no get. What will trigger the regions split? I have one small table, about 1.2GB with 8M lines. I merged it in a single region, and setup the maxsize to the 12MB. It got almost split... All the regions got split except one. Here is the screenshot: http://imageshack.us/photo/my-images/834/hannibalb.png/ It's not the first region, not the last. There is nothing specific with this region, and it's not getting split. Any idea why, and how I can trigger the split without putting any data into the date? Thanks, JM +
Anoop Sam John 2013-01-23, 06:17
-
Re: HBase split policyJean-Marc Spaggiari 2013-01-23, 12:26
Hi Anoop,
I ran another major_compact and the split is now totally done. Question is, why has it not been done initially when I ran the first major_compact. No idea. I will re-merge the file into one single region and re-compact to see if I can reproduce that. JM 2013/1/23, Anoop Sam John <[EMAIL PROTECTED]>: >>What will trigger the split? > The things which can trigger a split > 1. Explicit split call from the client side using admin API > 2. A memstore flush > 3. A compaction > > So even though there is no write operations happening on the region (no > flushes) still a compaction performed for that region can trigger split. > May be in your case compaction happened for some of the regions and resulted > in split... > > -Anoop- > ________________________________________ > From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] > Sent: Wednesday, January 23, 2013 8:09 AM > To: [EMAIL PROTECTED] > Subject: Re: HBase split policy > > Another related question. > > What will trigger the split? > > I mean, I merge all the regions in a single one, split that in 4 2.5GB > regions, alter it to set maxsize to 300MB and enable the table. I > don't do anything. No put, no get. What will trigger the regions > split? > > I have one small table, about 1.2GB with 8M lines. I merged it in a > single region, and setup the maxsize to the 12MB. It got almost > split... All the regions got split except one. > > Here is the screenshot: > http://imageshack.us/photo/my-images/834/hannibalb.png/ > > It's not the first region, not the last. There is nothing specific > with this region, and it's not getting split. > > Any idea why, and how I can trigger the split without putting any data > into the date? > > Thanks, > > JM +
Jean-Marc Spaggiari 2013-01-23, 12:26
-
Re: HBase split policyramkrishna vasudevan 2013-01-23, 18:09
>>This morning, I have some very big regions, still over the 100MB, and
>>some very small. And the big regions are at least hundred times bigger >>than the small one. The region that was bigger than 100 MB (much bigger) what was the data in them. Were there any hefty rows in them. Check them. Regarding the problem like the major_compact did not trigger split, could you check out the logs. May be the logs could give us some idea and based on that the calculations can be done. Regards Ram On Wed, Jan 23, 2013 at 5:56 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Anoop, > > I ran another major_compact and the split is now totally done. > > Question is, why has it not been done initially when I ran the first > major_compact. No idea. > > I will re-merge the file into one single region and re-compact to see > if I can reproduce that. > > JM > > 2013/1/23, Anoop Sam John <[EMAIL PROTECTED]>: > >>What will trigger the split? > > The things which can trigger a split > > 1. Explicit split call from the client side using admin API > > 2. A memstore flush > > 3. A compaction > > > > So even though there is no write operations happening on the region (no > > flushes) still a compaction performed for that region can trigger split. > > May be in your case compaction happened for some of the regions and > resulted > > in split... > > > > -Anoop- > > ________________________________________ > > From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] > > Sent: Wednesday, January 23, 2013 8:09 AM > > To: [EMAIL PROTECTED] > > Subject: Re: HBase split policy > > > > Another related question. > > > > What will trigger the split? > > > > I mean, I merge all the regions in a single one, split that in 4 2.5GB > > regions, alter it to set maxsize to 300MB and enable the table. I > > don't do anything. No put, no get. What will trigger the regions > > split? > > > > I have one small table, about 1.2GB with 8M lines. I merged it in a > > single region, and setup the maxsize to the 12MB. It got almost > > split... All the regions got split except one. > > > > Here is the screenshot: > > http://imageshack.us/photo/my-images/834/hannibalb.png/ > > > > It's not the first region, not the last. There is nothing specific > > with this region, and it's not getting split. > > > > Any idea why, and how I can trigger the split without putting any data > > into the date? > > > > Thanks, > > > > JM > +
ramkrishna vasudevan 2013-01-23, 18:09
-
Re: HBase split policyJean-Marc Spaggiari 2013-01-23, 18:24
It's all VERY small data... It's 4 bytes followed by a less thatn 256
bytes string, and there is no data. (one byte data). I merged the regions again and this time the split went well. I looked in the logs and did not find anything special. Now, I have a MR running for few hours, so I can't retry. But later I will. I will activate the debug logs for the split classes and give it another try. JM 2013/1/23, ramkrishna vasudevan <[EMAIL PROTECTED]>: >>>This morning, I have some very big regions, still over the 100MB, and >>>some very small. And the big regions are at least hundred times bigger >>>than the small one. > > The region that was bigger than 100 MB (much bigger) what was the data in > them. Were there any hefty rows in them. Check them. > > Regarding the problem like the major_compact did not trigger split, could > you check out the logs. May be the logs could give us some idea and based > on that the calculations can be done. > > Regards > Ram > > On Wed, Jan 23, 2013 at 5:56 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Anoop, >> >> I ran another major_compact and the split is now totally done. >> >> Question is, why has it not been done initially when I ran the first >> major_compact. No idea. >> >> I will re-merge the file into one single region and re-compact to see >> if I can reproduce that. >> >> JM >> >> 2013/1/23, Anoop Sam John <[EMAIL PROTECTED]>: >> >>What will trigger the split? >> > The things which can trigger a split >> > 1. Explicit split call from the client side using admin API >> > 2. A memstore flush >> > 3. A compaction >> > >> > So even though there is no write operations happening on the region (no >> > flushes) still a compaction performed for that region can trigger >> > split. >> > May be in your case compaction happened for some of the regions and >> resulted >> > in split... >> > >> > -Anoop- >> > ________________________________________ >> > From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] >> > Sent: Wednesday, January 23, 2013 8:09 AM >> > To: [EMAIL PROTECTED] >> > Subject: Re: HBase split policy >> > >> > Another related question. >> > >> > What will trigger the split? >> > >> > I mean, I merge all the regions in a single one, split that in 4 2.5GB >> > regions, alter it to set maxsize to 300MB and enable the table. I >> > don't do anything. No put, no get. What will trigger the regions >> > split? >> > >> > I have one small table, about 1.2GB with 8M lines. I merged it in a >> > single region, and setup the maxsize to the 12MB. It got almost >> > split... All the regions got split except one. >> > >> > Here is the screenshot: >> > http://imageshack.us/photo/my-images/834/hannibalb.png/ >> > >> > It's not the first region, not the last. There is nothing specific >> > with this region, and it's not getting split. >> > >> > Any idea why, and how I can trigger the split without putting any data >> > into the date? >> > >> > Thanks, >> > >> > JM >> > +
Jean-Marc Spaggiari 2013-01-23, 18:24
|