Let me start by saying that everything I say is experimental and this
information does not carry any warranties of correctness.
You basically have two avenues of recovery when zookeeper data is lost:
1. Create new tables and bulk import your old RFiles.
2. Try to recreate the Zookeeper data.
The first option has been done before, and is not too hard. You basically
just move the old HDFS directory, initialize a new instance, create all
your tables, find the RFiles from the old tables, and bulk load them into
the new tables. The risk here is that you will lose information that was
only in the write-ahead logs, and the conditions described in ACCUMULO-456
may cause you trouble.
The second option has never been done to our knowledge. The hard part there
is to create all of the tables that you used to have with the same table
IDs that they used to have, and with the same configuration. If you new the
mapping of table ID to table name, you could probably write a script that
did something like:
1. Move old HDFS directory.
2. Initialize new instance.
3. Bring new instance online (except for the garbage collector).
4. Create tables in the same order that you created them with the old
instance (including creating and deleting tables that were deleted in the
5. Take the new instance offline.
6. Create references to the correct write-ahead log files for the root
tablet of the old instance in zookeeper.
7. Delete the new HDFS directory.
8. Copy the old HDFS directory into the location of the new HDFS directory.
(as long as this is a copy and you don't start the garbage collector you
should be able to repeat these steps until you get them right)
9. Bring the system online and hope everything worked.
On Thu, Jul 5, 2012 at 11:15 AM, Krishmin Rai <[EMAIL PROTECTED]> wrote:
> Hi All,
> We've recently encountered a strange situation on a small test cluster:
> after an awkward crash, our ZooKeeper data was erased and we no longer have
> the [accumulo] znode. The HDFS accumulo directory is intact, so all the
> RFiles and etc are still there, but it's not clear how best to bring
> Accumulo back up to its previous state. Obviously just starting Accumulo
> as-is complains about the missing znode ("Waiting for accumulo to be
> initialized"), whereas re-initializing is not possible over existing HDFS
> directories ("It appears this location was previously initialized,
> A couple of questions about recovery strategies:
> 1) Is there any way to re-create the znode for a previous instance-id? My
> understanding is that ZK is mostly used to store ephemeral data (such as
> which tserver is currently responsible for which tablets) and things like
> users (which we could re-create), so perhaps this is plausible?
> 2) I imagine that I could init Accumulo with a new instance.dfs.dir, then
> import the RFiles from the old installation back in. I see Patrick just
> asked a related question, so, with the data integrity caveats, I would
> essentially be following the last of the steps in ACCUMULO-456.
> 3) This is a vague question, but have any of you had experience with the
> [accumulo] znode being entirely deleted? Aside from stopping/starting ZK
> (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what
> we could have done to actually delete it.
> This is just a test instance, and the data could easily be recreated, but
> I want to take this opportunity to learn a little more about Accumulo
> plumbing and maintenance.