Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Creating a Table using HFileOutputFormat


Copy link to this message
-
Re: Creating a Table using HFileOutputFormat
Renaud Delbru 2010-09-24, 16:12
 On 24/09/10 16:55, Ted Yu wrote:
>  From TotalOrderPartitioner:
>        K[] splitPoints = readPartitions(fs, partFile, keyClass, conf);
>        if (splitPoints.length != job.getNumReduceTasks() - 1) {
> Partition list can be empty if you use 1 reducer.
>
> But this is not what you want I guess.
Yes, this is not what we want since we want to create x regions.
But, we just found that there is a tool, InputSampler, in the hadoop
library for this task. It will sample an arbitrary dataset, and create
the partition splits. We will try first this approach. My guess is that,
even if these partitions are an approximation, it should be ok for
hbase. The size of the regions will be not totally identical, but it
should not be a problem since the larger regions will be the first ones
split into smaller regions by hbase. Can somebody confirm this assumption ?
--
Renaud Delbru