Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase bulk loaded region can't be splitted


Copy link to this message
-
Re: HBase bulk loaded region can't be splitted
I haven't done bulk loads using the importtsv tool, but I imagine it works
similarly to the mapreduce bulk load tool we are provided.  If so, the
following stands.

In order to do a bulk load you need to have a table ready to accept the
data.  The bulk load does not create regions, but only puts data into the
right place based on existing regions.  Since you only have 1 region to
start with, it makes sense that they would all go to that one region.  You
should find a way to calculate the regions that you want and create your
table with pre-created regions.  Then re-run the import.

On Thu, May 10, 2012 at 10:50 PM, Bruce Bian <[EMAIL PROTECTED]> wrote:

> I use importtsv to load data as HFile
>
> hadoop jar hbase-0.92.1.jar importtsv
> -Dimporttsv.bulk.output=/outputs/mytable.bulk
> -Dimporttsv.columns=HBASE_ROW_KEY,ns: -Dimporttsv.separator=, mytable
> /input
>
> Then I use completebulkload to load those bulk data into my table
>
> hadoop jar hbase-0.92.1.jar completebulkload /outputs/mytable.bulk mytable
>
> However, the size of table is very huge (4.x GB). And it has only one
> region. Oddly, why doesn't HBase split it into multiple regions? It did
> exceed the size to split (256MB).
>
> /hbase/mytable/71611409ea972a65b0876f953ad6377e/ns:
>
> [image: enter image description here]
>
> To split it, I try to use Split button on the Web UI of HBase. Sadly, it
> shows
>
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Region
> mytable,,1334215360439.71611409ea972a65b0876f953ad6377e. not
> splittable because midkey=null
>
> I have more data to load. About 300GB, no matter how many data I have
> loaded, it is still only one region. Also, it is still not splittable. Any
> idea?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB