we are trying to create a hbase table from scratch using map-reduce and
HFileOutputFormat. However, we haven't really find examples or tutorials
on how to do this, and there is some aspects which are still unclear for
us. We are using hbase 0.20.x.
First, what is the correct way to use HFileOutputFormat and to create
We are simply using a map function which output <ImmutableBytesWritable
(key), Put (value)>, an identity reducer, and we configure the job to
use HFileOutputFormat as an output format class.
However, we have seen in hbase 0.89.x a more complex way to do it,
involving sorting (KeyValueSortReducer, or PutSortReducer) and a
partitioner (TotalOrderPartitioner). The HFileOutputFormat provides a
convenience method, configureIncrementalLoad, to automatically configure
the hadoop job. Is this method needed in our case ? Ir is this only
necessary in the case where the table already exists (incremental bulk
Do we have to reimplement this for 0.20.x ?
Then, one time the table creation job is successful, how do we import
the hfiles into hbase ? Is it by using the hbase cli import command ?
Thanks in advance for your answers,
Stack 2010-09-23, 16:13
Renaud Delbru 2010-09-23, 16:50
Stack 2010-09-23, 18:22
Renaud Delbru 2010-09-23, 18:25
Renaud Delbru 2010-09-24, 11:54
Ted Yu 2010-09-24, 15:55
Renaud Delbru 2010-09-24, 16:12