|
|
-
HBase bulk loaded region can't be splitted
Bruce Bian 2012-05-11, 02:29
I use importtsv to load data as HFile
hadoop jar hbase-0.92.1.jar importtsv -Dimporttsv.bulk.output=/outputs/mytable.bulk -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable /input
Then I use completebulkload to load those bulk data into my table
hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk mytable
However, the size of table is very huge (4.x GB). And it has only one region. Oddly, why doesn't HBase split it into multiple regions? It did exceed the size to split (256MB).
/hbase/mytable/71611409ea972a65b0876f953ad6377e/ns:
[image: enter image description here]
To split it, I try to use Split button on the Web UI of HBase. Sadly, it shows
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not splittable because midkey=null
I have more data to load. About 300GB, no matter how many data I have loaded, it is still only one region. Also, it is still not splittable. Any idea?
-
Re: HBase bulk loaded region can't be splitted
Bryan Beaudreault 2012-05-11, 02:56
I haven't done bulk loads using the importtsv tool, but I imagine it works similarly to the mapreduce bulk load tool we are provided. If so, the following stands.
In order to do a bulk load you need to have a table ready to accept the data. The bulk load does not create regions, but only puts data into the right place based on existing regions. Since you only have 1 region to start with, it makes sense that they would all go to that one region. You should find a way to calculate the regions that you want and create your table with pre-created regions. Then re-run the import.
On Thu, May 10, 2012 at 10:50 PM, Bruce Bian <[EMAIL PROTECTED]> wrote:
> I use importtsv to load data as HFile > > hadoop jar hbase-0.92.1.jar importtsv > -Dimporttsv.bulk.output=/outputs/mytable.bulk > -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable > /input > > Then I use completebulkload to load those bulk data into my table > > hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk mytable > > However, the size of table is very huge (4.x GB). And it has only one > region. Oddly, why doesn't HBase split it into multiple regions? It did > exceed the size to split (256MB). > > /hbase/mytable/71611409ea972a65b0876f953ad6377e/ns: > > [image: enter image description here] > > To split it, I try to use Split button on the Web UI of HBase. Sadly, it > shows > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region > mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not > splittable because midkey=null > > I have more data to load. About 300GB, no matter how many data I have > loaded, it is still only one region. Also, it is still not splittable. Any > idea? >
-
Re: HBase bulk loaded region can't be splitted
Bruce Bian 2012-05-11, 03:07
Yes, I understand that. But after I complete the bulk load, shouldn't it trigger the region server to split that region in order to meet the *hbase*.*hregion*.*max*.*filesize * criteria? When I try to split the regions manually using the WebUI, nothing happened, but instead a Region mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not splittable because midkey=null message is found in the region server log. On Fri, May 11, 2012 at 10:56 AM, Bryan Beaudreault < [EMAIL PROTECTED]> wrote:
> I haven't done bulk loads using the importtsv tool, but I imagine it works > similarly to the mapreduce bulk load tool we are provided. If so, the > following stands. > > In order to do a bulk load you need to have a table ready to accept the > data. The bulk load does not create regions, but only puts data into the > right place based on existing regions. Since you only have 1 region to > start with, it makes sense that they would all go to that one region. You > should find a way to calculate the regions that you want and create your > table with pre-created regions. Then re-run the import. > > On Thu, May 10, 2012 at 10:50 PM, Bruce Bian <[EMAIL PROTECTED]> > wrote: > > > I use importtsv to load data as HFile > > > > hadoop jar hbase-0.92.1.jar importtsv > > -Dimporttsv.bulk.output=/outputs/mytable.bulk > > -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable > > /input > > > > Then I use completebulkload to load those bulk data into my table > > > > hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk > mytable > > > > However, the size of table is very huge (4.x GB). And it has only one > > region. Oddly, why doesn't HBase split it into multiple regions? It did > > exceed the size to split (256MB). > > > > /hbase/mytable/71611409ea972a65b0876f953ad6377e/ns: > > > > [image: enter image description here] > > > > To split it, I try to use Split button on the Web UI of HBase. Sadly, it > > shows > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region > > mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not > > splittable because midkey=null > > > > I have more data to load. About 300GB, no matter how many data I have > > loaded, it is still only one region. Also, it is still not splittable. > Any > > idea? > > >
-
Re: HBase bulk loaded region can't be splitted
Subir S 2012-05-12, 08:27
Wouldn't major_compact trigger a split...if it really needs to split....
However if you want to presplit regions for your table you can use the regionsplitter utility as below:
$export HADOOP_CLASSPATH=`hbase classpath`; hbase org.apache.hadoop.hbase.util.RegionSplitter
This will give you a usage....
sample is: hbase org.apache.hadoop.hbase.util.RegionSplitter -c 10 'mytable' -f ns On Fri, May 11, 2012 at 8:37 AM, Bruce Bian <[EMAIL PROTECTED]> wrote:
> Yes, I understand that. > But after I complete the bulk load, shouldn't it trigger the region server > to split that region in order to meet the > *hbase*.*hregion*.*max*.*filesize > * criteria? > When I try to split the regions manually using the WebUI, nothing happened, > but instead a Region > mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. > not splittable because midkey=null > message is found in the region server log. > > > On Fri, May 11, 2012 at 10:56 AM, Bryan Beaudreault < > [EMAIL PROTECTED]> wrote: > > > I haven't done bulk loads using the importtsv tool, but I imagine it > works > > similarly to the mapreduce bulk load tool we are provided. If so, the > > following stands. > > > > In order to do a bulk load you need to have a table ready to accept the > > data. The bulk load does not create regions, but only puts data into the > > right place based on existing regions. Since you only have 1 region to > > start with, it makes sense that they would all go to that one region. > You > > should find a way to calculate the regions that you want and create your > > table with pre-created regions. Then re-run the import. > > > > On Thu, May 10, 2012 at 10:50 PM, Bruce Bian <[EMAIL PROTECTED]> > > wrote: > > > > > I use importtsv to load data as HFile > > > > > > hadoop jar hbase-0.92.1.jar importtsv > > > -Dimporttsv.bulk.output=/outputs/mytable.bulk > > > -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable > > > /input > > > > > > Then I use completebulkload to load those bulk data into my table > > > > > > hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk > > mytable > > > > > > However, the size of table is very huge (4.x GB). And it has only one > > > region. Oddly, why doesn't HBase split it into multiple regions? It did > > > exceed the size to split (256MB). > > > > > > /hbase/mytable/71611409ea972a65b0876f953ad6377e/ns: > > > > > > [image: enter image description here] > > > > > > To split it, I try to use Split button on the Web UI of HBase. Sadly, > it > > > shows > > > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region > > > mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not > > > splittable because midkey=null > > > > > > I have more data to load. About 300GB, no matter how many data I have > > > loaded, it is still only one region. Also, it is still not splittable. > > Any > > > idea? > > > > > >
-
Re: HBase bulk loaded region can't be splitted
Yifeng Jiang 2012-05-13, 02:00
Hi, You need to create your table with pre-split regions. $hbase org.apache.hadoop.hbase.util.RegionSplitter -c 10 -f region_name your_table This command will pre-create 10 regions in your table using MD5 strings as region boundaries. You can also customize the splitting algorithm. Please see http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.html-Yifeng On May 11, 2012, at 11:29 AM, Bruce Bian wrote: > I use importtsv to load data as HFile > > hadoop jar hbase-0.92.1.jar importtsv > -Dimporttsv.bulk.output=/outputs/mytable.bulk > -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable > /input > > Then I use completebulkload to load those bulk data into my table > > hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk mytable > > However, the size of table is very huge (4.x GB). And it has only one > region. Oddly, why doesn't HBase split it into multiple regions? It did > exceed the size to split (256MB). > > /hbase/mytable/71611409ea972a65b0876f953ad6377e/ns: > > [image: enter image description here] > > To split it, I try to use Split button on the Web UI of HBase. Sadly, it > shows > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region > mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not > splittable because midkey=null > > I have more data to load. About 300GB, no matter how many data I have > loaded, it is still only one region. Also, it is still not splittable. Any > idea?
|
|