Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> slave tserver not responding


Copy link to this message
-
Re: slave tserver not responding
Ok -- turned out to be a couple of little things, with one big one :D

The big one -- iptables was still running on the slave :)

I noticed that you were getting the same noroutetohost exceptions coming
from the datanode logs trying to replicate, so I assume there was
something outside of Hadoop. A `telnet slave_ip_addr port` on with the
information that was showing up in the stack trace verified that I
indeed could not. IPtables had an exception for SSH, so that's why
SSH'ing still worked and Arshak could start the processes.

Small things:

It looked like IPv6 was still running via ifconfig, I disabled those via
procfs and disabled them permanently via sysctl. That would have likely
caused more trouble but I noticed this before iptables.

Max open files was still at 1024, which was likely to cause you more
problems. I just upped them for the user you run Accumulo as.

- Josh

On 1/1/14, 2:28 PM, Josh Elser wrote:
> Sure -- you have my address already.
>
> Also, nc not working while the tabletserver is dead makes sense (that
> process is what's listening on that port). Once the process dies,
> there's nothing else listening.
>
> On 1/1/2014 1:31 PM, Arshak Navruzyan wrote:
>> If anyone wants to look at my live environment please let me know (your
>> gmail) and I will add you to the Google Compute Engine.  Thanks!
>>
>>
>> On Wed, Jan 1, 2014 at 7:58 AM, Arshak Navruzyan <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     Sean
>>
>>     Thanks for looking into the log files.
>>
>>     These are two Google compute engine instance under the same project
>>     so there shouldn't be any firewall between them.
>>
>>     For the brief moment that the slave runs during startup, I can nc
>>     into port 9997 from the master to the slave.  But after it crashes,
>>     I can't.  Seems like somehow the problem is on the slave.
>>
>>     Arshak
>>
>>     On Dec 31, 2013 11:58 PM, "Sean Busbey" <busbey+[EMAIL PROTECTED]
>>     <mailto:busbey%[EMAIL PROTECTED]>> wrote:
>>
>>         Well, I can tell you the proximal cause.  the tserver log shows
>>         that it starts normally, then exits because it's told to (via
>>         the zookeeper lock being removed).
>>
>>         If you look at the master debug logs, this happens because the
>>         master fails in three attempts to talk to the tserver, all with
>>         the same error:
>>
>>         2014-01-01 06:17:20,231 [master.Master] ERROR: unable to get
>>         tablet server status 10.240.203.36:9997[1434c70ed30001b]
>>         org.apache.thrift.transport.TTransportException:
>>         java.net.NoRouteToHostException: No route to host
>>
>>         Unfortunately, this is the same error you noticed in your first
>>         email. After 3 of those, the master deletes the zk lock so that
>>         the tserver will shutdown.
>>
>>         Could there be another firewall blocking access to port 9997 on
>>         the worker machine from the master machine?
>>
>>         Check from the master (you'll need netcat):
>>
>>         $ nc -z 10.240.203.36 9997
>>         $ echo $?
>>
>>
>>
>>
>>
>>         On Wed, Jan 1, 2014 at 12:33 AM, Arshak Navruzyan
>>         <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>             I am probably missing something really basic so I posted
>>             both the master and the slave log files:
>>
>>             https://www.dropbox.com/sh/liv1mzuohyiv6uu/X5kx7AZJ6i
>>
>>             Thanks again to everyone for the help!
>>
>>
>>             On Tue, Dec 31, 2013 at 10:20 PM, Arshak Navruzyan
>>             <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>                 disabled selinux (iptables already off) on both master
>>                 and slave but didn't make a difference unfortunately.
>>
>>
>>
>>                 On Tue, Dec 31, 2013 at 9:25 PM, Kurt Christensen
>>                 <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>
>>                     SELINUX disabled? IPTABLES configured? I have
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB