Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Waiting for accumulo to be initialized


Copy link to this message
-
Re: Waiting for accumulo to be initialized
Eric Newton 2013-03-27, 20:19
I should write this up in the user manual.  It's not that hard, but it's
really not the first thing you want to tackle while learning how to use
accumulo.  I just opened
ACCUMULO-1217<https://issues.apache.org/jira/browse/ACCUMULO-1217> to
do that.

I wrote this from memory: expect errors.  Needless to say, you would only
want to do this when you are more comfortable with hadoop, zookeeper and
accumulo.

First, get zookeeper up and running, even if you have delete all its data.

Next, attempt to determine the mapping of table names to tableIds.  You can
do this in the shell when your accumulo instance is healthy.  If it isn't
healthy, you will have to guess based on the data in the files in HDFS.

So, for example, the table "trace" is probably table id "1".  You can find
the files for trace in /accumulo/tables/1.

Don't worry if you get the names wrong.  You can always rename the tables
later.

Move the old files for accumulo out of the way and re-initialize:

$ hadoop fs -mv /accumulo /accumulo-old
$ ./bin/accumulo init
$ ./bin/start-all.sh

Recreate your tables:

$ ./bin/accumulo shell -u root -p mysecret
shell > createtable table1

Learn the new table id mapping:
shell > tables -l
!METADATA => !0
trace => 1
table1 => 2
...

Bulk import all your data back into the new table ids:
Assuming you have determined that "table1" used to be table id "a" and is
now "2",
you do something like this:

$ hadoop fs -mkdir /tmp/failed
$ ./bin/accumulo shell -u root -p mysecret
shell > table table1
shell table1 > importdirectory /accumulo-old/tables/a/default_tablet
/tmp/failed true

There are lots of directories under every table id directory.  You will
need to import each of them.  I suggest creating a script and passing it to
the shell on the command line.

I know of instances in which trillions of entries were recovered and
available in a matter of hours.

-Eric

On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <[EMAIL PROTECTED]> wrote:

> when you say " you can move the files aside in HDFS" .. which files are
> you referring to? I have never set up zookeeper myself so I am not aware of
> all the changes needed.
>
>
>
> On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <[EMAIL PROTECTED]>wrote:
>
>> If you lose zookeeper, you can move the files aside in HDFS, recreate
>> your instance in zookeeper and bulk import all of the old files.  It's not
>> perfect: you lose table configurations, split points and user permissions,
>> but you do preserve most of the data.
>>
>> You can back up each of these bits of information periodically if you
>> like.  Outside of the files in HDFS, the configuration information is
>> pretty small.
>>
>> -Eric
>>
>>
>>
>> On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis <[EMAIL PROTECTED]> wrote:
>>
>>> Eric and Josh thanks for all your feedback. We ended up *loosing all
>>> our accumulo data* because I had to reformat hadoop. Here is in a
>>> nutshell what I did:
>>>
>>>
>>>    1. Stop accumulo
>>>    2. Stop hadoop
>>>    3. On hadoop master and all datanodes, from dfs.data.dir
>>>    (hdfs-site.xml) remove everything under the data folder
>>>    4. On hadoop master, from dfs.name.dir (hdfs-site.xml) remove
>>>    everything under the name folder
>>>    5. As hadoop user, execute.../hadoop/bin/hadoop namenode -format
>>>    6. As hadoop user, execute.../hadoop/bin/start-all.sh ==> should
>>>    populate data/ and name/ dirs that was erased in steps 3, 4.
>>>    7. Initialized Accumulo - as accumulo user,
>>>     ../accumulo/bin/accumulo init (I created a new instance)
>>>    8. Start accumulo
>>>
>>> I was wondering if anyone had suggestions or thoughts on how I could
>>> have solved the original issue of accumulo waiting initialization without
>>> loosing my accumulo data? Is it possible to do so?
>>>
>>
>>
>