Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Creating a Table using HFileOutputFormat

Renaud Delbru 2010-09-23, 11:16
Stack 2010-09-23, 16:13
Renaud Delbru 2010-09-23, 16:50
Stack 2010-09-23, 18:22
Renaud Delbru 2010-09-23, 18:25
Renaud Delbru 2010-09-24, 11:54
Ted Yu 2010-09-24, 15:55
Copy link to this message
Re: Creating a Table using HFileOutputFormat
 On 24/09/10 16:55, Ted Yu wrote:
>  From TotalOrderPartitioner:
>        K[] splitPoints = readPartitions(fs, partFile, keyClass, conf);
>        if (splitPoints.length != job.getNumReduceTasks() - 1) {
> Partition list can be empty if you use 1 reducer.
> But this is not what you want I guess.
Yes, this is not what we want since we want to create x regions.
But, we just found that there is a tool, InputSampler, in the hadoop
library for this task. It will sample an arbitrary dataset, and create
the partition splits. We will try first this approach. My guess is that,
even if these partitions are an approximation, it should be ok for
hbase. The size of the regions will be not totally identical, but it
should not be a problem since the larger regions will be the first ones
split into smaller regions by hbase. Can somebody confirm this assumption ?
Renaud Delbru