|
Paul Mackles
2012-02-20, 21:20
Stack
2012-02-20, 21:28
Paul Mackles
2012-02-20, 21:58
Stack
2012-02-21, 05:19
Jacques
2012-02-21, 16:22
lars hofhansl
2012-02-21, 17:27
lars hofhansl
2012-02-22, 01:55
lars hofhansl
2012-02-24, 02:12
lars hofhansl
2012-02-24, 07:27
|
-
export/import for backupPaul Mackles 2012-02-20, 21:20
We are on hbase 0.90.4 (cd3u2). We are using the standard hbase export/import for backups. In a test run, our imports ran extremely slow. While a full export of our dataset took about an hour, the corresponding import took 20+ hours (for 216 regions across 15 servers). While it finished, I am a little uncomfortable with that sort of recovery time should disaster strike. Are there any recommendations for speeding up imports in a recovery scenario? One thing I noticed while watching the region-server logs was that there were a lot of compactions happening during the import (both major and minor). Should we disable compactions while the import is running and then do it all at the end? We have our region-size set to 100GB right now so we can manage splitting. Thanks in advance for any recommendations.
-- Paul Mackles, Senior Manager, Adobe
-
Re: export/import for backupStack 2012-02-20, 21:28
On Mon, Feb 20, 2012 at 1:20 PM, Paul Mackles <[EMAIL PROTECTED]> wrote:
> We are on hbase 0.90.4 (cd3u2). We are using the standard hbase export/import for backups. In a test run, our imports ran extremely slow. While a full export of our dataset took about an hour, the corresponding import took 20+ hours (for 216 regions across 15 servers). While it finished, I am a little uncomfortable with that sort of recovery time should disaster strike. Are there any recommendations for speeding up imports in a recovery scenario? One thing I noticed while watching the region-server logs was that there were a lot of compactions happening during the import (both major and minor). Should we disable compactions while the import is running and then do it all at the end? We have our region-size set to 100GB right now so we can manage splitting. Thanks in advance for any recommendations. > Can you tell where it was spending the time Paul? Upping config. so less flushing sounds like it might good way to go. You might want to do stuff like large flush sizes when importing so flushes are larger. How did you import? A MR job? It was running full on? HBase was what was keeping it slow? Anyone played with going from an export to a bulk load? I wonder if this would run faster? St.Ack
-
RE: export/import for backupPaul Mackles 2012-02-20, 21:58
Import was run as an M/R job on the same configuration as the export (15 nodes, 5 tasks per node). Nodes are 8 cores with 23GB of total RAM (6GB for hbase RS). As far as I could tell, everything was running pretty balanced and hbase was the bottleneck due to all of the compaction.
Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using bulk loads to migrate data from an older data store and they have worked awesome for us. It also doesn't seem like it would be that hard to implement. So what am I missing? Paul -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack Sent: Monday, February 20, 2012 4:29 PM To: [EMAIL PROTECTED] Subject: Re: export/import for backup On Mon, Feb 20, 2012 at 1:20 PM, Paul Mackles <[EMAIL PROTECTED]> wrote: > We are on hbase 0.90.4 (cd3u2). We are using the standard hbase export/import for backups. In a test run, our imports ran extremely slow. While a full export of our dataset took about an hour, the corresponding import took 20+ hours (for 216 regions across 15 servers). While it finished, I am a little uncomfortable with that sort of recovery time should disaster strike. Are there any recommendations for speeding up imports in a recovery scenario? One thing I noticed while watching the region-server logs was that there were a lot of compactions happening during the import (both major and minor). Should we disable compactions while the import is running and then do it all at the end? We have our region-size set to 100GB right now so we can manage splitting. Thanks in advance for any recommendations. > Can you tell where it was spending the time Paul? Upping config. so less flushing sounds like it might good way to go. You might want to do stuff like large flush sizes when importing so flushes are larger. How did you import? A MR job? It was running full on? HBase was what was keeping it slow? Anyone played with going from an export to a bulk load? I wonder if this would run faster? St.Ack
-
Re: export/import for backupStack 2012-02-21, 05:19
On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote:
> Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using bulk loads to migrate data from an older data store and they have worked awesome for us. It also doesn't seem like it would be that hard to implement. So what am I missing? > Little? Check out the Import.java in mapreduce package. See how its pulling from SequenceFiles into a map that outputs to a TableOutputFormat inside in the map. Make a new MR job that has same input but that outputs to HFileOutputFormat instead (you'll need the total order partitioner and a reducer in the mix which Import doesn't have). St.Ack
-
Re: export/import for backupJacques 2012-02-21, 16:22
I was thinking about this and have a couple thoughts...
While Stack's solution above would work, it means a couple things: 1. if you haven't saved splits, your going to have to figure out how to pre-split for a full restore. 2. you have to wait for the data re-sort at recovery time instead of backup time so recovery time will be substantially longer. It seems like we should make a new script like export that automatically exports the data as bulk importable along with all of the table's schema and split information. We then could make an import script that simply creates the backed up table (to potentially a different target name) and then bulk imports it, pre-splitting using the splits defined on export. (We actually did something like this recently to migrate data from one format to another.) It wouldn't work for the case where you are trying to do a merged restore (e.g. pre-existing table) but it seems like recovery would be really quick. I suppose you could allow it to support importing into an existing table but then you may have to wait for splits on a bunch of the files (I know the bulk import script is designed to do this but i'm not sure how it would handle a large amount of splits if your target table has diverged substantially from when the backup was done). Jacques On Mon, Feb 20, 2012 at 9:19 PM, Stack <[EMAIL PROTECTED]> wrote: > On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote: > > Actually, an hbase export to "bulk load" facility sounds like a great > idea. We have been using bulk loads to migrate data from an older data > store and they have worked awesome for us. It also doesn't seem like it > would be that hard to implement. So what am I missing? > > > > Little? > > Check out the Import.java in mapreduce package. See how its pulling > from SequenceFiles into a map that outputs to a TableOutputFormat > inside in the map. Make a new MR job that has same input but that > outputs to HFileOutputFormat instead (you'll need the total order > partitioner and a reducer in the mix which Import doesn't have). > > St.Ack >
-
Re: export/import for backuplars hofhansl 2012-02-21, 17:27
It seems we could converge the import and importtsv tools. importtsv can write directly to a (life) table or use HFileOutputFormat.
-- Lars ________________________________ From: Stack <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, February 20, 2012 9:19 PM Subject: Re: export/import for backup On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote: > Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using bulk loads to migrate data from an older data store and they have worked awesome for us. It also doesn't seem like it would be that hard to implement. So what am I missing? > Little? Check out the Import.java in mapreduce package. See how its pulling from SequenceFiles into a map that outputs to a TableOutputFormat inside in the map. Make a new MR job that has same input but that outputs to HFileOutputFormat instead (you'll need the total order partitioner and a reducer in the mix which Import doesn't have). St.Ack
-
Re: export/import for backuplars hofhansl 2012-02-22, 01:55
I filed HBASE-5440.
Although I am placing this more as a import to bulk load. I.e. we run export as do now, but on import one can choose to create HFiles for bulk load, instead of updating the life cluster through the API. -- Lars ________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Tuesday, February 21, 2012 9:27 AM Subject: Re: export/import for backup It seems we could converge the import and importtsv tools. importtsv can write directly to a (life) table or use HFileOutputFormat. -- Lars ________________________________ From: Stack <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, February 20, 2012 9:19 PM Subject: Re: export/import for backup On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote: > Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using bulk loads to migrate data from an older data store and they have worked awesome for us. It also doesn't seem like it would be that hard to implement. So what am I missing? > Little? Check out the Import.java in mapreduce package. See how its pulling from SequenceFiles into a map that outputs to a TableOutputFormat inside in the map. Make a new MR job that has same input but that outputs to HFileOutputFormat instead (you'll need the total order partitioner and a reducer in the mix which Import doesn't have). St.Ack
-
Re: export/import for backuplars hofhansl 2012-02-24, 02:12
In HBASE-5440 I propose a patch that does exactly that.
It's a bit more complicated that one would think since I wanted it to be able to deal with delete markers (and KEEP_DELETED_CELLS). Please have a look. -- Lars ________________________________ From: Stack <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, February 20, 2012 9:19 PM Subject: Re: export/import for backup On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote: > Actually, an hbase export to "bulk load" facility sounds like a great idea. We have been using bulk loads to migrate data from an older data store and they have worked awesome for us. It also doesn't seem like it would be that hard to implement. So what am I missing? > Little? Check out the Import.java in mapreduce package. See how its pulling from SequenceFiles into a map that outputs to a TableOutputFormat inside in the map. Make a new MR job that has same input but that outputs to HFileOutputFormat instead (you'll need the total order partitioner and a reducer in the mix which Import doesn't have). St.Ack
-
Re: export/import for backuplars hofhansl 2012-02-24, 07:27
Recovery from these exported HFiles should be extremely fast.
We can add an option to Export that export to HFiles instead using HFileOutputFormat instead. Note that you will eat the sort CPU at export time, though, which is (hopefully) more frequent than importing, not entirely sure that this is a good trade-off. -- Lars ________________________________ From: Jacques <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, February 21, 2012 8:22 AM Subject: Re: export/import for backup I was thinking about this and have a couple thoughts... While Stack's solution above would work, it means a couple things: 1. if you haven't saved splits, your going to have to figure out how to pre-split for a full restore. 2. you have to wait for the data re-sort at recovery time instead of backup time so recovery time will be substantially longer. It seems like we should make a new script like export that automatically exports the data as bulk importable along with all of the table's schema and split information. We then could make an import script that simply creates the backed up table (to potentially a different target name) and then bulk imports it, pre-splitting using the splits defined on export. (We actually did something like this recently to migrate data from one format to another.) It wouldn't work for the case where you are trying to do a merged restore (e.g. pre-existing table) but it seems like recovery would be really quick. I suppose you could allow it to support importing into an existing table but then you may have to wait for splits on a bunch of the files (I know the bulk import script is designed to do this but i'm not sure how it would handle a large amount of splits if your target table has diverged substantially from when the backup was done). Jacques On Mon, Feb 20, 2012 at 9:19 PM, Stack <[EMAIL PROTECTED]> wrote: > On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[EMAIL PROTECTED]> wrote: > > Actually, an hbase export to "bulk load" facility sounds like a great > idea. We have been using bulk loads to migrate data from an older data > store and they have worked awesome for us. It also doesn't seem like it > would be that hard to implement. So what am I missing? > > > > Little? > > Check out the Import.java in mapreduce package. See how its pulling > from SequenceFiles into a map that outputs to a TableOutputFormat > inside in the map. Make a new MR job that has same input but that > outputs to HFileOutputFormat instead (you'll need the total order > partitioner and a reducer in the mix which Import doesn't have). > > St.Ack > |