-Re: How to re-IP an Accumulo Cluster
Keith Turner 2013-08-15, 16:22
Use zkCli.sh and look in /accumulo/<accumulo instance id>
In 1.4 Accumulo started locking its info in zookeeper down, so you may need
to execute the following command :
addauth digest accumulo:SECRET
Replace SECRET with the secret from your accumulo-site.xml file.
On Thu, Aug 15, 2013 at 12:05 PM, Terry P. <[EMAIL PROTECTED]> wrote:
> Hi Keith,
> Many thanks for your detailed reply. I forgot to mention that yes indeed
> this is on Accumulo 1.4.2, and it was the write-ahead logs that were the
> issue -- partly because two of the tabletservers were not properly shutdown
> before the re-IP operation, so recovery may have been needed on them.
> My naivety on Zookeeper certainly hampered the research as well. How does
> one "look in zookeeper to see what is going on?" Any pointers would be
> really helpful.
> I wish we could go to 1.5 and take advantage of the walogs in HDFS, but no
> can do at this point unfortunately.
> On Thu, Aug 15, 2013 at 10:24 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>> On Thu, Aug 15, 2013 at 11:01 AM, Terry P. <[EMAIL PROTECTED]> wrote:
>>> Greetings everyone,
>>> We had to re-IP our entire cluster recently to change subnetworks, and
>>> we essentially lost everything (it was development, so no big deal).
>>> However, doing a re-IP operation may be required in actual operational
>>> cases, and I'd like to know if it can be done or not so we can note it for
>>> the future (as in document "what not to do" to avoid data loss).
>>> The issue we had was that after shutting down the cluster, re-IPing all
>>> servers, and starting everything back up, the tablets were still assigned
>>> to Tabletservers with the old IP addresses, even though all the hostnames
>>> were the same. So the system showed 3 Tabletservers, but no tablets, and
>>> no entries in the tables where previously there were 400 million.
>>> A) Does Zookeeper track Tabletservers by IP address only, and not
>> It does track by IP address, but not only IP address. Each tablet server
>> has an ephemeral node in zookeeper under the IP address. This ehpemeral
>> node should go away when the tserver process dies, and then the master will
>> assume that tserver is dead. The location of a tablet in the metadata
>> table is conceptually <ephemeral node id>+<IP address>, so once that
>> ephemeral node goes away the location in metadata table is assumed invalid
>> and the tablet is reassigned. If another tserver starts at the same IP,
>> then the master can differentiate because the ephemeral node is different.
>> You can look at the children nodes under a tserver ip in zookeeper. Look
>> at the data for the lowest numbered ephemeral node to to get infor about
>> who holds the lock for that IP.
>>> B) If A is true, is there a mechanism to change those entries in
>>> Zookeeper so that a re-IP operation could be performed?
>> A first step would be to look in zookeeper and see what going on with the
>> ephemeral nodes.
>> In Accumulo 1.3 and 1.4 one thing that normally causes problems when
>> changing lots of IP addrs is write ahead logs. Tablets point to their
>> write ahead logs using the IP address of the logger. This can cause walog
>> recovery to fail. In 1.5 walog are stored in HDFS so this not an issue.