Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk Loading - LoadIncrementalHFiles


Copy link to this message
-
Bulk Loading - LoadIncrementalHFiles
Hi everyone,

I'm using MR to bulk load into HBase by
using HFileOutputFormat.configureIncrementalLoad and after the job is
complete I use loadIncrementalHFiles.doBulkLoad

>From what I see, the MR outputs a file for each CF written and to my
understanding these files are loaded as store files into a region.

What I don't understand is *how many regions will open* ? and *how is that
determined *?
If I have 3 CF's and a lot of data to load, does that mean 3 large store
files will load into 1 region (more ?) and this region will split on major
compaction ?

Can I pre-create regions and tell the bulk load to split the data between
them during the load ?

In general, if someone could elaborate about LoadIncrementalHFiles it would
save me a lot of time diving into it.
Another question I is about running over values, is it possible to load an
updated value ? or generally updating columns and values for an existing
key ?
I'd think that there's no problem but when I try to run the same bulk load
twice (MR and then load) with the same data, the second time fails.
Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........
I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution
exception during splitting...
Thanks!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB