|
|
-
Failed recovery after master/tservers IP chages
Kristopher Kane 2012-09-23, 14:50
All,
I was doing some shuffling around at home and changed IPs on my master and all tservers. I thought this would be OK as I had configured everything via hostnames but I've got some log entries that say otherwise: Unable to recover 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out) java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
This is the master reporting on two tservers of the four I have. Also to note: I did have an unclean shutdown prior to the IP changes and the monitor shows no tablets loaded for any table with the recovery directory in HDFS empty. I don't need the data and I can always init, but I was curious about fixing this to learn more about the system.
Where is a good place to start?
Thanks, -Kris Kane
-
Re: Failed recovery after master/tservers IP chages
Kristopher Kane 2012-09-23, 15:07
I left some parts out:
This is 1.4 and the GC process for fille collection has been running since the cluster turned on. So, does that mean things are being held up in the WA logs?
-KRis
On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <[EMAIL PROTECTED]> wrote: > All, > > I was doing some shuffling around at home and changed IPs on my master > and all tservers. I thought this would be OK as I had configured > everything via hostnames but I've got some log entries that say > otherwise: > > > Unable to recover > 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException: > org.apache.thrift.transport.TTransportException: > java.net.ConnectException: Connection timed out) > java.io.IOException: org.apache.thrift.transport.TTransportException: > java.net.ConnectException: Connection timed out > > > > This is the master reporting on two tservers of the four I have. Also > to note: I did have an unclean shutdown prior to the IP changes and > the monitor shows no tablets loaded for any table with the recovery > directory in HDFS empty. > I don't need the data and I can always init, but I was curious about > fixing this to learn more about the system. > > Where is a good place to start? > > Thanks, > > > -Kris Kane
-
Re: Failed recovery after master/tservers IP chages
John Vines 2012-09-23, 16:02
Accumulo registers write ahead logs by logger ip, not hostname. So even though you start up processes by hostname, there is still a dependency on ip consistency for log recovery.
Sent from my phone, so pardon the typos and brevity. On Sep 23, 2012 10:50 AM, "Kristopher Kane" <[EMAIL PROTECTED]> wrote:
> All, > > I was doing some shuffling around at home and changed IPs on my master > and all tservers. I thought this would be OK as I had configured > everything via hostnames but I've got some log entries that say > otherwise: > > > Unable to recover > > 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException > : > org.apache.thrift.transport.TTransportException: > java.net.ConnectException: Connection timed out) > java.io.IOException: > org.apache.thrift.transport.TTransportException: > java.net.ConnectException: Connection timed out > > > > This is the master reporting on two tservers of the four I have. Also > to note: I did have an unclean shutdown prior to the IP changes and > the monitor shows no tablets loaded for any table with the recovery > directory in HDFS empty. > I don't need the data and I can always init, but I was curious about > fixing this to learn more about the system. > > Where is a good place to start? > > Thanks, > > > -Kris Kane >
-
Re: Failed recovery after master/tservers IP chages
Jim Klucar 2012-09-23, 16:59
Hadoop has some weird DNS/Reverse DNS lookup requirements. My guess would be that Hadoop is bonking.
Sent from my iPhone
On Sep 23, 2012, at 11:07 AM, Kristopher Kane <[EMAIL PROTECTED]> wrote:
> I left some parts out: > > This is 1.4 and the GC process for fille collection has been running > since the cluster turned on. So, does that mean things are being held > up in the WA logs? > > -KRis > > On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <[EMAIL PROTECTED]> wrote: >> All, >> >> I was doing some shuffling around at home and changed IPs on my master >> and all tservers. I thought this would be OK as I had configured >> everything via hostnames but I've got some log entries that say >> otherwise: >> >> >> Unable to recover >> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException: >> org.apache.thrift.transport.TTransportException: >> java.net.ConnectException: Connection timed out) >> java.io.IOException: org.apache.thrift.transport.TTransportException: >> java.net.ConnectException: Connection timed out >> >> >> >> This is the master reporting on two tservers of the four I have. Also >> to note: I did have an unclean shutdown prior to the IP changes and >> the monitor shows no tablets loaded for any table with the recovery >> directory in HDFS empty. >> I don't need the data and I can always init, but I was curious about >> fixing this to learn more about the system. >> >> Where is a good place to start? >> >> Thanks, >> >> >> -Kris Kane
-
Re: Failed recovery after master/tservers IP chages
John Vines 2012-09-23, 17:04
No Jim, it's a thrift transport exception and the dfsclient doesn't use thrift. Dfs is fairly well designed to avoid any sort of host identity needed for persistence.
John
Sent from my phone, so pardon thetypos and brevity. On Sep 23, 2012 1:00 PM, "Jim Klucar" <[EMAIL PROTECTED]> wrote:
> Hadoop has some weird DNS/Reverse DNS lookup requirements. My guess > would be that Hadoop is bonking. > > Sent from my iPhone > > On Sep 23, 2012, at 11:07 AM, Kristopher Kane <[EMAIL PROTECTED]> > wrote: > > > I left some parts out: > > > > This is 1.4 and the GC process for fille collection has been running > > since the cluster turned on. So, does that mean things are being held > > up in the WA logs? > > > > -KRis > > > > On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <[EMAIL PROTECTED]> > wrote: > >> All, > >> > >> I was doing some shuffling around at home and changed IPs on my master > >> and all tservers. I thought this would be OK as I had configured > >> everything via hostnames but I've got some log entries that say > >> otherwise: > >> > >> > >> Unable to recover > >> > 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException > : > >> org.apache.thrift.transport.TTransportException: > >> java.net.ConnectException: Connection timed out) > >> java.io.IOException: > org.apache.thrift.transport.TTransportException: > >> java.net.ConnectException: Connection timed out > >> > >> > >> > >> This is the master reporting on two tservers of the four I have. Also > >> to note: I did have an unclean shutdown prior to the IP changes and > >> the monitor shows no tablets loaded for any table with the recovery > >> directory in HDFS empty. > >> I don't need the data and I can always init, but I was curious about > >> fixing this to learn more about the system. > >> > >> Where is a good place to start? > >> > >> Thanks, > >> > >> > >> -Kris Kane >
|
|