Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - slave tserver not responding


Copy link to this message
-
Re: slave tserver not responding
Josh Elser 2014-01-01, 23:35
Ok -- turned out to be a couple of little things, with one big one :D

The big one -- iptables was still running on the slave :)

I noticed that you were getting the same noroutetohost exceptions coming
from the datanode logs trying to replicate, so I assume there was
something outside of Hadoop. A `telnet slave_ip_addr port` on with the
information that was showing up in the stack trace verified that I
indeed could not. IPtables had an exception for SSH, so that's why
SSH'ing still worked and Arshak could start the processes.

Small things:

It looked like IPv6 was still running via ifconfig, I disabled those via
procfs and disabled them permanently via sysctl. That would have likely
caused more trouble but I noticed this before iptables.

Max open files was still at 1024, which was likely to cause you more
problems. I just upped them for the user you run Accumulo as.

- Josh

On 1/1/14, 2:28 PM, Josh Elser wrote:
> Sure -- you have my address already.
>
> Also, nc not working while the tabletserver is dead makes sense (that
> process is what's listening on that port). Once the process dies,
> there's nothing else listening.
>
> On 1/1/2014 1:31 PM, Arshak Navruzyan wrote:
>> If anyone wants to look at my live environment please let me know (your
>> gmail) and I will add you to the Google Compute Engine.  Thanks!
>>
>>
>> On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     Sean
>>
>>     Thanks for looking into the log files.
>>
>>     These are two Google compute engine instance under the same project
>>     so there shouldn't be any firewall between them.
>>
>>     For the brief moment that the slave runs during startup, I can nc
>>     into port 9997 from the master to the slave.  But after it crashes,
>>     I can't.  Seems like somehow the problem is on the slave.
>>
>>     Arshak
>>
>>     On Dec 31, 2013 11:58 PM, "Sean Busbey" <busbey+[EMAIL PROTECTED]
>>     <mailto:busbey%[EMAIL PROTECTED]>> wrote:
>>
>>         Well, I can tell you the proximal cause.  the tserver log shows
>>         that it starts normally, then exits because it's told to (via
>>         the zookeeper lock being removed).
>>
>>         If you look at the master debug logs, this happens because the
>>         master fails in three attempts to talk to the tserver, all with
>>         the same error:
>>
>>         2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get
>>         tablet server status 10.240.203.36:9997[1434c70ed30001b]
>>         org.apache.thrift.transport.TTransportException:
>>         java.net.NoRouteToHostException: No route to host
>>
>>         Unfortunately, this is the same error you noticed in your first
>>         email. After 3 of those, the master deletes the zk lock so that
>>         the tserver will shutdown.
>>
>>         Could there be another firewall blocking access to port 9997 on
>>         the worker machine from the master machine?
>>
>>         Check from the master (you'll need netcat):
>>
>>         $ nc -z 10.240.203.36 9997
>>         $ echo $?
>>
>>
>>
>>
>>
>>         On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan
>>         <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>             I am probably missing something really basic so I posted
>>             both the master and the slave log files:
>>
>>             https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
>>
>>             Thanks again to everyone for the help!
>>
>>
>>             On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan
>>             <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>                 disabled selinux (iptables already off) on both master
>>                 and slave but didn't make a difference unfortunately.
>>
>>
>>
>>                 On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen
>>                 <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>
>>                     SELINUX disabled? IPTABLES configured? I have