Norbert Burger 2012-08-17, 19:17
If you want to customize the bulkloader then you can write your own mapper
to define the business logic for loading. You need to specify the mapper at
the time of running importsv by using:
Refer to this link: http://hbase.apache.org/book.html#importtsv
On Fri, Aug 17, 2012 at 12:17 PM, Norbert Burger
> Hi folks -- we're running CDH3u3 (0.90.4). I'm trying export data
> from an existing table that has far too many regions (2600+ for only 8
> regionservers) into one with a more reasonable region count for this
> cluster (256). Overall data volume is approx. 3 TB.
> I thought initially that I'd use the bulkload/importtsv approach, but
> it turns out this table's schema has column qualifiers made from
> timestamps, so it's impossible for me to specify a list of target
> columns for importtsv. From what I can tell, the TSV interchange
> format requires your data to have the same colquals throughout.
> I took a look at CopyTable and Export/Import, which both appear to
> wrap the Hbase client API (emitting Puts from a mapper). But I'm
> seeing significant performance problems with this approach, to the
> point that I'm not sure it's feasible. Export appears to work OK, but
> when I try importing the data back from HDFS, the rest of our cluster
> drags to halt -- client writes (even those not associated with the
> Import) start timing out. Fwiw, import already disables autoFlush
> (via TableOutputFormat).
> From , one option I could try would to disable the WAL. Are there
> are other techniques I should try? Has anyone implemented a
> bulkloader which doesn't use the TSV format?
>  http://hbase.apache.org/book/perf.writing.html
Thanks & Regards,
Norbert Burger 2012-08-21, 17:41
Michael Segel 2012-08-18, 11:14
Norbert Burger 2012-08-21, 18:14