|
|
Manoj Murumkar 2011-06-08, 17:22
Hi,
We're trying to come up with right strategy for backing up HBase tables. Assumption is that sizes of tables will not grow beyond few hundred GB. Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data). Are there any suggestions to improve the performance of this?
Thanks,
Manoj
+
Manoj Murumkar 2011-06-08, 17:22
Joey Echeverria 2011-06-08, 22:47
Can you afford some down time? If so, you could minor compact, disable the table, distcp, and then enable the table.
-Joey
On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar <[EMAIL PROTECTED]> wrote: > Hi, > > We're trying to come up with right strategy for backing up HBase tables. > Assumption is that sizes of tables will not grow beyond few hundred GB. > Currently, we're employing exports (writing onto HDFS of another cluster > directly), but is taking too long (~5 hours to export ~5GB of data). Are > there any suggestions to improve the performance of this? > > Thanks, > > Manoj >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
+
Joey Echeverria 2011-06-08, 22:47
Manoj Murumkar 2011-06-09, 00:24
We are trying to do this online as downtime is not an option. Good point, nonetheless. On Jun 8, 2011 3:48 PM, "Joey Echeverria" <[EMAIL PROTECTED]> wrote: > Can you afford some down time? If so, you could minor compact, disable > the table, distcp, and then enable the table. > > -Joey > > On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar <[EMAIL PROTECTED]> wrote: >> Hi, >> >> We're trying to come up with right strategy for backing up HBase tables. >> Assumption is that sizes of tables will not grow beyond few hundred GB. >> Currently, we're employing exports (writing onto HDFS of another cluster >> directly), but is taking too long (~5 hours to export ~5GB of data). Are >> there any suggestions to improve the performance of this? >> >> Thanks, >> >> Manoj >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434
+
Manoj Murumkar 2011-06-09, 00:24
Otis Gospodnetic 2011-06-09, 00:56
There is this post about HBase backup options.... http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it helps. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/----- Original Message ---- > From: Manoj Murumkar <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wed, June 8, 2011 8:24:39 PM > Subject: Re: HBase Backups > > We are trying to do this online as downtime is not an option. Good point, > nonetheless. > On Jun 8, 2011 3:48 PM, "Joey Echeverria" <[EMAIL PROTECTED]> wrote: > > Can you afford some down time? If so, you could minor compact, disable > > the table, distcp, and then enable the table. > > > > -Joey > > > > On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar <[EMAIL PROTECTED]> > wrote: > >> Hi, > >> > >> We're trying to come up with right strategy for backing up HBase tables. > >> Assumption is that sizes of tables will not grow beyond few hundred GB. > >> Currently, we're employing exports (writing onto HDFS of another cluster > >> directly), but is taking too long (~5 hours to export ~5GB of data). Are > >> there any suggestions to improve the performance of this? > >> > >> Thanks, > >> > >> Manoj > >> > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 >
+
Otis Gospodnetic 2011-06-09, 00:56
Manoj Murumkar 2011-06-09, 04:17
Thanks, I have seen it. Once I verify a viable solution, I will update this thread. On Jun 8, 2011 5:57 PM, "Otis Gospodnetic" <[EMAIL PROTECTED]> wrote: > There is this post about HBase backup options.... > http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it helps. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/> > > > ----- Original Message ---- >> From: Manoj Murumkar <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Wed, June 8, 2011 8:24:39 PM >> Subject: Re: HBase Backups >> >> We are trying to do this online as downtime is not an option. Good point, >> nonetheless. >> On Jun 8, 2011 3:48 PM, "Joey Echeverria" <[EMAIL PROTECTED]> wrote: >> > Can you afford some down time? If so, you could minor compact, disable >> > the table, distcp, and then enable the table. >> > >> > -Joey >> > >> > On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar < [EMAIL PROTECTED]> >> wrote: >> >> Hi, >> >> >> >> We're trying to come up with right strategy for backing up HBase tables. >> >> Assumption is that sizes of tables will not grow beyond few hundred GB. >> >> Currently, we're employing exports (writing onto HDFS of another cluster >> >> directly), but is taking too long (~5 hours to export ~5GB of data). Are >> >> there any suggestions to improve the performance of this? >> >> >> >> Thanks, >> >> >> >> Manoj >> >> >> > >> > >> > >> > -- >> > Joseph Echeverria >> > Cloudera, Inc. >> > 443.305.9434 >>
+
Manoj Murumkar 2011-06-09, 04:17
Ted Dunning 2011-06-09, 06:33
Otis, We should talk some time about MapR. We did a test with Stack where we had an hbase instance with very active writes going on. We did successive snapshots with no interruption or pause in hbase operations and were able to demonstrate the each snapshot was usable to restore hbase to the state it had when the snapshot was taken. On Thu, Jun 9, 2011 at 12:56 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > There is this post about HBase backup options.... > http://blog.sematext.com/2011/03/11/hbase-backup-options/ . I hope it > helps. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/> > > > ----- Original Message ---- > > From: Manoj Murumkar <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Wed, June 8, 2011 8:24:39 PM > > Subject: Re: HBase Backups > > > > We are trying to do this online as downtime is not an option. Good > point, > > nonetheless. > > On Jun 8, 2011 3:48 PM, "Joey Echeverria" <[EMAIL PROTECTED]> wrote: > > > Can you afford some down time? If so, you could minor compact, disable > > > the table, distcp, and then enable the table. > > > > > > -Joey > > > > > > On Wed, Jun 8, 2011 at 1:22 PM, Manoj Murumkar < > [EMAIL PROTECTED]> > > wrote: > > >> Hi, > > >> > > >> We're trying to come up with right strategy for backing up HBase > tables. > > >> Assumption is that sizes of tables will not grow beyond few hundred > GB. > > >> Currently, we're employing exports (writing onto HDFS of another > cluster > > >> directly), but is taking too long (~5 hours to export ~5GB of data). > Are > > >> there any suggestions to improve the performance of this? > > >> > > >> Thanks, > > >> > > >> Manoj > > >> > > > > > > > > > > > > -- > > > Joseph Echeverria > > > Cloudera, Inc. > > > 443.305.9434 > > >
+
Ted Dunning 2011-06-09, 06:33
On Wed, Jun 8, 2011 at 11:33 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Otis, > > We should talk some time about MapR. We did a test with Stack where we had > an hbase instance with very active writes going on. We did successive > snapshots with no interruption or pause in hbase operations and were able to > demonstrate the each snapshot was usable to restore hbase to the state it > had when the snapshot was taken. >
Yeah. Was kinda impressive. Snapshot appeared 'instantaneous'. St.Ack
+
Stack 2011-06-09, 06:52
Eric Charles 2011-06-09, 07:12
Good news!
I suppose there's a risk of "incoherent" backup.
I mean, with classical sql databases, online-backups ensure that the taken dataset can be restored in a state where all open transactions are committed. Even if the backup takes hours, the initial backuped data is finally updated to reflect the last transactions.
With the MR process you describe, I guess we don't have this guarantee. Let's say, if an insert is achieved in Table_A and Table_A snapshot is already taken by the MR, we could have a Table_B snapshot that mention this last entry.
This is why I understand this process, even if fast, as a best-effort to backup the datas.
Please correct me if I'm wrong. Tks, - Eric On 09/06/11 08:52, Stack wrote: > On Wed, Jun 8, 2011 at 11:33 PM, Ted Dunning<[EMAIL PROTECTED]> wrote: >> Otis, >> >> We should talk some time about MapR. We did a test with Stack where we had >> an hbase instance with very active writes going on. We did successive >> snapshots with no interruption or pause in hbase operations and were able to >> demonstrate the each snapshot was usable to restore hbase to the state it >> had when the snapshot was taken. >> > > Yeah. Was kinda impressive. Snapshot appeared 'instantaneous'. > St.Ack
+
Eric Charles 2011-06-09, 07:12
Ted Dunning 2011-06-09, 07:49
On Thu, Jun 9, 2011 at 9:12 AM, Eric Charles <[EMAIL PROTECTED]>wrote: > Good news! > > I suppose there's a risk of "incoherent" backup. > There would be but we spent a ton of time making that not so. And the hbase devs have done a bunch of work making sure that the WAL works right. > I mean, with classical sql databases, online-backups ensure that the taken > dataset can be restored in a state where all open transactions are > committed. Even if the backup takes hours, the initial backuped data is > finally updated to reflect the last transactions. > The snapshot I mentioned is atomic. Really. That means that it is equivalent to having the same state as if all of the machines lost power simultaneously. Since hbase is not crash safe, the snapshot is, by definition and intent, restartable. > With the MR process you describe, I guess we don't have this guarantee. > Let's say, if an insert is achieved in Table_A and Table_A snapshot is > already taken by the MR, we could have a Table_B snapshot that mention this > last entry. > MapR. Not Map Reduce. See http://mapr.com/ for some sparse information. Come to hadoop summit for more information (see http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#11 ) > > This is why I understand this process, even if fast, as a best-effort to > backup the datas. > I don't think you are quite clear on what is happening. > > Please correct me if I'm wrong. > Done!
+
Ted Dunning 2011-06-09, 07:49
Eric Charles 2011-06-09, 09:19
Oops, sorry, I confused MapR (the company) with Map Reduce (MR, the technology). Time for me to update my knowledge on Hadoop ecosystem. Tks, - Eric On 09/06/11 09:49, Ted Dunning wrote: > On Thu, Jun 9, 2011 at 9:12 AM, Eric Charles<[EMAIL PROTECTED]>wrote: > >> Good news! >> >> I suppose there's a risk of "incoherent" backup. >> > > There would be but we spent a ton of time making that not so. And the hbase > devs have done a bunch of work making sure that the WAL works right. > > >> I mean, with classical sql databases, online-backups ensure that the taken >> dataset can be restored in a state where all open transactions are >> committed. Even if the backup takes hours, the initial backuped data is >> finally updated to reflect the last transactions. >> > > The snapshot I mentioned is atomic. Really. That means that it is > equivalent to having the same state as if all of the machines lost power > simultaneously. Since hbase is not crash safe, the snapshot is, by > definition and intent, restartable. > > >> With the MR process you describe, I guess we don't have this guarantee. >> Let's say, if an insert is achieved in Table_A and Table_A snapshot is >> already taken by the MR, we could have a Table_B snapshot that mention this >> last entry. >> > > MapR. Not Map Reduce. See http://mapr.com/ for some sparse information. > Come to hadoop summit for more information (see > http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#11 ) > > >> >> This is why I understand this process, even if fast, as a best-effort to >> backup the datas. >> > > I don't think you are quite clear on what is happening. > > >> >> Please correct me if I'm wrong. >> > > Done! >
+
Eric Charles 2011-06-09, 09:19
|
|