Import was run as an M/R job on the same configuration as the export (15 nodes, 5 tasks per node). Nodes are 8 cores with 23GB of total RAM (6GB for hbase RS). As far as I could tell, everything was running pretty balanced and hbase was the bottleneck due to all of the compaction.
Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using bulk loads to migrate data from an older data store and they have worked awesome for us. It also doesn't seem like it would be that hard to implement. So what am I missing?
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack
Sent: Monday, February 20, 2012 4:29 PM
To: [EMAIL PROTECTED]
Subject: Re: export/import for backup
On Mon, Feb 20, 2012 at 1:20 PM, Paul Mackles <[EMAIL PROTECTED]> wrote:
> We are on hbase 0.90.4 (cd3u2). We are using the standard hbase export/import for backups. In a test run, our imports ran extremely slow. While a full export of our dataset took about an hour, the corresponding import took 20+ hours (for 216 regions across 15 servers). While it finished, I am a little uncomfortable with that sort of recovery time should disaster strike. Are there any recommendations for speeding up imports in a recovery scenario? One thing I noticed while watching the region-server logs was that there were a lot of compactions happening during the import (both major and minor). Should we disable compactions while the import is running and then do it all at the end? We have our region-size set to 100GB right now so we can manage splitting. Thanks in advance for any recommendations.
Can you tell where it was spending the time Paul? Upping config. so
less flushing sounds like it might good way to go. You might want to
do stuff like large flush sizes when importing so flushes are larger.
How did you import? A MR job? It was running full on? HBase was what
was keeping it slow?
Anyone played with going from an export to a bulk load? I wonder if
this would run faster?