Many thanks for your detailed reply. I forgot to mention that yes indeed
this is on Accumulo 1.4.2, and it was the write-ahead logs that were the
issue -- partly because two of the tabletservers were not properly shutdown
before the re-IP operation, so recovery may have been needed on them.
My naivety on Zookeeper certainly hampered the research as well. How does
one "look in zookeeper to see what is going on?" Any pointers would be
I wish we could go to 1.5 and take advantage of the walogs in HDFS, but no
can do at this point unfortunately.
On Thu, Aug 15, 2013 at 10:24 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
> On Thu, Aug 15, 2013 at 11:01 AM, Terry P. <[EMAIL PROTECTED]> wrote:
>> Greetings everyone,
>> We had to re-IP our entire cluster recently to change subnetworks, and we
>> essentially lost everything (it was development, so no big deal). However,
>> doing a re-IP operation may be required in actual operational cases, and
>> I'd like to know if it can be done or not so we can note it for the future
>> (as in document "what not to do" to avoid data loss).
>> The issue we had was that after shutting down the cluster, re-IPing all
>> servers, and starting everything back up, the tablets were still assigned
>> to Tabletservers with the old IP addresses, even though all the hostnames
>> were the same. So the system showed 3 Tabletservers, but no tablets, and
>> no entries in the tables where previously there were 400 million.
>> A) Does Zookeeper track Tabletservers by IP address only, and not
> It does track by IP address, but not only IP address. Each tablet server
> has an ephemeral node in zookeeper under the IP address. This ehpemeral
> node should go away when the tserver process dies, and then the master will
> assume that tserver is dead. The location of a tablet in the metadata
> table is conceptually <ephemeral node id>+<IP address>, so once that
> ephemeral node goes away the location in metadata table is assumed invalid
> and the tablet is reassigned. If another tserver starts at the same IP,
> then the master can differentiate because the ephemeral node is different.
> You can look at the children nodes under a tserver ip in zookeeper. Look
> at the data for the lowest numbered ephemeral node to to get infor about
> who holds the lock for that IP.
>> B) If A is true, is there a mechanism to change those entries in
>> Zookeeper so that a re-IP operation could be performed?
> A first step would be to look in zookeeper and see what going on with the
> ephemeral nodes.
> In Accumulo 1.3 and 1.4 one thing that normally causes problems when
> changing lots of IP addrs is write ahead logs. Tablets point to their
> write ahead logs using the IP address of the logger. This can cause walog
> recovery to fail. In 1.5 walog are stored in HDFS so this not an issue.