-RE: HDFS snapshots restore
Bennie Schut 2013-11-29, 07:46
In addition to Binglin Chang's reply. When you either snapshot or manually copy the data you need to understand a little bit about how hive works to be able to do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table with a date partition it will use the metadata to know which partitions exist. So for example you have these partitions on hdfs:
If you drop parition "2013-11-27" it will also remove the metadata reference. So if you restore the data the partition will exist on hdfs but you still need to do some "add partition" commands before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot the hdfs data so you get one consistent view which you can trust to be correct.
From: Binglin Chang [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 28, 2013 4:27 PM
To: [EMAIL PROTECTED]
Subject: Re: HDFS snapshots restore
snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can:
distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir
On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
I have read the documentation about HDFS snapshots for hadoop 2 (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html) but it is still not clear how do I use this snapshots to restore the data.
Lets say I have a directory with the data corresponding to a Hive table that I want to backup. I take a snapshot today and tomorrow I find out that the modifications done to the table/directory after the snapshot are wrong and I want to revert the directory to the snapshot state. How do I achieve this?
Also, can I extract the snapshot from HDFS and save it in an external storage and later use it to restore this directory in a new empty cluster? or which is the recommended way to do this?