|
|
Amlan Roy 2012-07-23, 12:52
Hi,
Is it feasible to do disk or tape backup for Hbase tables?
I have read about the tools like Export, CopyTable, Distcp. It seems like they will require a separate HDFS cluster to do that.
Regards,
Amlan
Michael Segel 2012-07-23, 14:49
Amian,
Like always the answer to your question is... it depends.
First, how much data are we talking about?
What's the value of the underlying data?
One possible scenario... You run a M/R job to copy data from the table to an HDFS file, that is then copied to attached storage on an edge node and then to tape. Depending on how much data, how much disk is in the attached storage you may want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold copy on tape off to some offsite storage facility.
There are other options, but it all depends on what you want to achieve.
With respect to the other tools...
You can export (which is a m/r job) to a local directory, then use distcp to a different cluster. hadoop fs -copyToLocal will let you copy off the cluster. You could write your own code, but you don't get much gain over existing UNIX/Linux tools. On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
> Hi, > > > > Is it feasible to do disk or tape backup for Hbase tables? > > > > I have read about the tools like Export, CopyTable, Distcp. It seems like > they will require a separate HDFS cluster to do that. > > > > Regards, > > Amlan >
Amlan Roy 2012-07-23, 15:33
Hi Michael,
Thanks a lot for the reply. What I want to achieve is, if my cluster goes down for some reason, I should be able to create a new cluster and should be able to import all the backed up data. As I want to store all the tables, I expect the data size to be huge (in order of Tera Bytes) and it will keep growing.
If I have understood correctly, you have suggested to run "export" to get the data into hdfs and then run "hadoop fs -copyToLocal" to get it into local file. If I take a back up of the files, is it possible to import that data to a new Hbase cluster?
Thanks and regards, Amlan
-----Original Message----- From: Michael Segel [mailto:[EMAIL PROTECTED]] Sent: Monday, July 23, 2012 8:19 PM To: [EMAIL PROTECTED] Subject: Re: Hbase bkup options
Amian,
Like always the answer to your question is... it depends.
First, how much data are we talking about?
What's the value of the underlying data?
One possible scenario... You run a M/R job to copy data from the table to an HDFS file, that is then copied to attached storage on an edge node and then to tape. Depending on how much data, how much disk is in the attached storage you may want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold copy on tape off to some offsite storage facility.
There are other options, but it all depends on what you want to achieve.
With respect to the other tools...
You can export (which is a m/r job) to a local directory, then use distcp to a different cluster. hadoop fs -copyToLocal will let you copy off the cluster. You could write your own code, but you don't get much gain over existing UNIX/Linux tools. On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
> Hi, > > > > Is it feasible to do disk or tape backup for Hbase tables? > > > > I have read about the tools like Export, CopyTable, Distcp. It seems like > they will require a separate HDFS cluster to do that. > > > > Regards, > > Amlan >
Alok Kumar 2012-07-23, 16:24
Hello everyone,
I too have similar use-case, where I've setup a separate HBase Replica Cluster. and enabled 'Replciation_Scope' for tables.
Q. Do I need to create 'table + ColFamily' in backup cluster everytime a new *table* gets created in 'production' cluster? or Is there a way where table schema too get replicated across cluster( like put+delete get replicated) ?
Your help is highly appreciated Thanks
(I tried sending separate email to group, but it get returned as spam :(
On Mon, Jul 23, 2012 at 9:03 PM, Amlan Roy <[EMAIL PROTECTED]> wrote:
> Hi Michael, > > Thanks a lot for the reply. What I want to achieve is, if my cluster goes > down for some reason, I should be able to create a new cluster and should > be > able to import all the backed up data. As I want to store all the tables, I > expect the data size to be huge (in order of Tera Bytes) and it will keep > growing. > > If I have understood correctly, you have suggested to run "export" to get > the data into hdfs and then run "hadoop fs -copyToLocal" to get it into > local file. If I take a back up of the files, is it possible to import that > data to a new Hbase cluster? > > Thanks and regards, > Amlan > > -----Original Message----- > From: Michael Segel [mailto:[EMAIL PROTECTED]] > Sent: Monday, July 23, 2012 8:19 PM > To: [EMAIL PROTECTED] > Subject: Re: Hbase bkup options > > Amian, > > Like always the answer to your question is... it depends. > > First, how much data are we talking about? > > What's the value of the underlying data? > > One possible scenario... > You run a M/R job to copy data from the table to an HDFS file, that is then > copied to attached storage on an edge node and then to tape. > Depending on how much data, how much disk is in the attached storage you > may > want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold > copy on tape off to some offsite storage facility. > > There are other options, but it all depends on what you want to achieve. > > With respect to the other tools... > > You can export (which is a m/r job) to a local directory, then use distcp > to a different cluster. hadoop fs -copyToLocal will let you copy off the > cluster. > You could write your own code, but you don't get much gain over existing > UNIX/Linux tools. > > > On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote: > > > Hi, > > > > > > > > Is it feasible to do disk or tape backup for Hbase tables? > > > > > > > > I have read about the tools like Export, CopyTable, Distcp. It seems like > > they will require a separate HDFS cluster to do that. > > > > > > > > Regards, > > > > Amlan > > > > -- Alok Kumar
Minh Duc Nguyen 2012-07-23, 19:05
Once your backup data has been put back into HDFS, you can import it into HBase using this command: bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir> See http://hbase.apache.org/book/ops_mgt.html#import for more information. HTH, Minh On Mon, Jul 23, 2012 at 11:33 AM, Amlan Roy <[EMAIL PROTECTED]> wrote: > Hi Michael, > > Thanks a lot for the reply. What I want to achieve is, if my cluster goes > down for some reason, I should be able to create a new cluster and should > be > able to import all the backed up data. As I want to store all the tables, I > expect the data size to be huge (in order of Tera Bytes) and it will keep > growing. > > If I have understood correctly, you have suggested to run "export" to get > the data into hdfs and then run "hadoop fs -copyToLocal" to get it into > local file. If I take a back up of the files, is it possible to import that > data to a new Hbase cluster? > > Thanks and regards, > Amlan > > -----Original Message----- > From: Michael Segel [mailto:[EMAIL PROTECTED]] > Sent: Monday, July 23, 2012 8:19 PM > To: [EMAIL PROTECTED] > Subject: Re: Hbase bkup options > > Amian, > > Like always the answer to your question is... it depends. > > First, how much data are we talking about? > > What's the value of the underlying data? > > One possible scenario... > You run a M/R job to copy data from the table to an HDFS file, that is then > copied to attached storage on an edge node and then to tape. > Depending on how much data, how much disk is in the attached storage you > may > want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold > copy on tape off to some offsite storage facility. > > There are other options, but it all depends on what you want to achieve. > > With respect to the other tools... > > You can export (which is a m/r job) to a local directory, then use distcp > to a different cluster. hadoop fs -copyToLocal will let you copy off the > cluster. > You could write your own code, but you don't get much gain over existing > UNIX/Linux tools. > > > On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote: > > > Hi, > > > > > > > > Is it feasible to do disk or tape backup for Hbase tables? > > > > > > > > I have read about the tools like Export, CopyTable, Distcp. It seems like > > they will require a separate HDFS cluster to do that. > > > > > > > > Regards, > > > > Amlan > > > >
Michael Segel 2012-07-23, 23:03
There are a couple of nits...
1) Compression. This will help a bit when moving the files around.
2) Data size. You may have bandwidth issues. Moving TBs of data over a 1GBe network can impact your cluster's performance. (Even with compression)
Depending on your cluster(s) and infrastructure, there is going to be a point where the cost of trying to back up to tape is going to exceed the cost of replicating to a second cluster. At the same time, you have to remember that restoring TBs of data will take time.
How large a data set will vary by organization. Again, only you can determine the value of your data.
If you are backing up to a secondary cluster ... you can use the replication feature in HBase. This would be a better fit if you are looking at backing up a large set of HBase tables. On Jul 23, 2012, at 10:33 AM, Amlan Roy wrote:
> Hi Michael, > > Thanks a lot for the reply. What I want to achieve is, if my cluster goes > down for some reason, I should be able to create a new cluster and should be > able to import all the backed up data. As I want to store all the tables, I > expect the data size to be huge (in order of Tera Bytes) and it will keep > growing. > > If I have understood correctly, you have suggested to run "export" to get > the data into hdfs and then run "hadoop fs -copyToLocal" to get it into > local file. If I take a back up of the files, is it possible to import that > data to a new Hbase cluster? > > Thanks and regards, > Amlan > > -----Original Message----- > From: Michael Segel [mailto:[EMAIL PROTECTED]] > Sent: Monday, July 23, 2012 8:19 PM > To: [EMAIL PROTECTED] > Subject: Re: Hbase bkup options > > Amian, > > Like always the answer to your question is... it depends. > > First, how much data are we talking about? > > What's the value of the underlying data? > > One possible scenario... > You run a M/R job to copy data from the table to an HDFS file, that is then > copied to attached storage on an edge node and then to tape. > Depending on how much data, how much disk is in the attached storage you may > want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold > copy on tape off to some offsite storage facility. > > There are other options, but it all depends on what you want to achieve. > > With respect to the other tools... > > You can export (which is a m/r job) to a local directory, then use distcp > to a different cluster. hadoop fs -copyToLocal will let you copy off the > cluster. > You could write your own code, but you don't get much gain over existing > UNIX/Linux tools. > > > On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote: > >> Hi, >> >> >> >> Is it feasible to do disk or tape backup for Hbase tables? >> >> >> >> I have read about the tools like Export, CopyTable, Distcp. It seems like >> they will require a separate HDFS cluster to do that. >> >> >> >> Regards, >> >> Amlan >> > >
|
|