Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Waiting for accumulo to be initialized


Copy link to this message
-
Re: Waiting for accumulo to be initialized
Krishmin Rai 2013-03-27, 21:00
Hi Aji,
I wrote the original question linked below (about re-initing Accumulo over an existing installation).  For what it's worth, I believe that my ZooKeeper data loss was related to the linux+java leap second bug -- not likely to be affecting you now (I did not go back and attempt to re-create the issue, so it's also possible there were other compounding issues). We have not encountered any ZK data-loss problems since.

At the time, I did some basic experiments to understand the process better, and successfully followed (essentially) the steps Eric has described. The only real difficulty I had was identifying which directories corresponded to which tables; I ended up iterating over individual RFiles and manually identifying tables based on expected data. This was a somewhat painful process, but at least made me confident that it would be possible in production.

It's also important to note that, at least according to my understanding, this procedure still potentially loses data: mutations written after the last minor compaction will only have reached the write-ahead-logs and will not be available in the raw RFiles you're importing from.

-Krishmin

On Mar 27, 2013, at 4:45 PM, Aji Janis wrote:

> Eric, Really appreciate you jotting this down. Too late to try it out this time but will give this a try (if, hopefully not) there is a next time to be had.
>
> Thanks again.
>
>
>
> On Wed, Mar 27, 2013 at 4:19 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
> I should write this up in the user manual.  It's not that hard, but it's really not the first thing you want to tackle while learning how to use accumulo.  I just opened ACCUMULO-1217 to do that.
>
> I wrote this from memory: expect errors.  Needless to say, you would only want to do this when you are more comfortable with hadoop, zookeeper and accumulo.
>
> First, get zookeeper up and running, even if you have delete all its data.  
>
> Next, attempt to determine the mapping of table names to tableIds.  You can do this in the shell when your accumulo instance is healthy.  If it isn't healthy, you will have to guess based on the data in the files in HDFS.
>
> So, for example, the table "trace" is probably table id "1".  You can find the files for trace in /accumulo/tables/1.
>
> Don't worry if you get the names wrong.  You can always rename the tables later.
>
> Move the old files for accumulo out of the way and re-initialize:
>
> $ hadoop fs -mv /accumulo /accumulo-old
> $ ./bin/accumulo init
> $ ./bin/start-all.sh
>
> Recreate your tables:
>
> $ ./bin/accumulo shell -u root -p mysecret
> shell > createtable table1
>
> Learn the new table id mapping:
> shell > tables -l
> !METADATA => !0
> trace => 1
> table1 => 2
> ...
>
> Bulk import all your data back into the new table ids:
> Assuming you have determined that "table1" used to be table id "a" and is now "2",
> you do something like this:
>
> $ hadoop fs -mkdir /tmp/failed
> $ ./bin/accumulo shell -u root -p mysecret
> shell > table table1
> shell table1 > importdirectory /accumulo-old/tables/a/default_tablet /tmp/failed true
>
> There are lots of directories under every table id directory.  You will need to import each of them.  I suggest creating a script and passing it to the shell on the command line.
>
> I know of instances in which trillions of entries were recovered and available in a matter of hours.
>
> -Eric
>
>
>
> On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <[EMAIL PROTECTED]> wrote:
> when you say " you can move the files aside in HDFS" .. which files are you referring to? I have never set up zookeeper myself so I am not aware of all the changes needed.
>
>
>
> On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
> If you lose zookeeper, you can move the files aside in HDFS, recreate your instance in zookeeper and bulk import all of the old files.  It's not perfect: you lose table configurations, split points and user permissions, but you do preserve most of the data.