Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase bulk loaded region can't be splitted


Copy link to this message
-
Re: HBase bulk loaded region can't be splitted
Bryan Beaudreault 2012-05-11, 02:56
I haven't done bulk loads using the importtsv tool, but I imagine it works
similarly to the mapreduce bulk load tool we are provided.  If so, the
following stands.

In order to do a bulk load you need to have a table ready to accept the
data.  The bulk load does not create regions, but only puts data into the
right place based on existing regions.  Since you only have 1 region to
start with, it makes sense that they would all go to that one region.  You
should find a way to calculate the regions that you want and create your
table with pre-created regions.  Then re-run the import.

On Thu, May 10, 2012 at 10:50 PM, Bruce Bian <[EMAIL PROTECTED]> wrote:

> I use importtsv to load data as HFile
>
> hadoop jar hbase-0.92.1.jar importtsv
> -Dimporttsv.bulk.output=/outputs/mytable.bulk
> -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable
> /input
>
> Then I use completebulkload to load those bulk data into my table
>
> hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk mytable
>
> However, the size of table is very huge (4.x GB). And it has only one
> region. Oddly, why doesn't HBase split it into multiple regions? It did
> exceed the size to split (256MB).
>
> /hbase/mytable/71611409ea972a65b0876f953ad6377e/ns:
>
> [image: enter image description here]
>
> To split it, I try to use Split button on the Web UI of HBase. Sadly, it
> shows
>
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region
> mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not
> splittable because midkey=null
>
> I have more data to load. About 300GB, no matter how many data I have
> loaded, it is still only one region. Also, it is still not splittable. Any
> idea?
>