These symptoms would appear to be caused by problems with table
operations, which are heavily dependent on the master being able to
use data in zookeeper.
So, try to find the first errors, especially those related to
serialization or deserialization closest to when the master first
What do you get when you run:
$ ./bin/accumulo org.apache.accumulo.server.fate.Admin print
On Fri, Nov 1, 2013 at 4:17 PM, Dave Mullins <[EMAIL PROTECTED]> wrote:
> Hadoop version 0.20.2-cdh3u5
> This was installed from the cdh rpms but is not controlled by a cloudera
> I read what documentation I could find on the upgrade.
> I installed from the tarball version of 1.5.0.
> I made sure to include the commons collection in the accumulo library path.
> I made sure to add the dfs.support.append true to the hdfs-site files.
> I did a complete restart ( to include a reboot) of the system.
> All of the tablet servers come online
> all the master's services come online and seem to be working. (The monitor
> does show the correct number of tablets, tablet servers, and so forth.)
> I am able to use some of the features of the accumulo shell
> I can display the contents of a table.
> I can't create or delete a table without getting the following error:
> [impl.ThriftTransportPool] WARN: Thread "shell" stuck on io to
> x.x.x.x:9999:9999 (0) for at least 120040 ms
> When I go digging in the logs I find very few errors. (These systems are not
> on a net I can cut and paste to here so I am trying to represent the issue
> as best I can.)
> There are 4 errors that the Repo runner [0-3] threads died
> Another error that springs up occasionally is : WARN: Thread "GC" stuck on
> io to x.x.x.x:9999:9999 (0) for at least 120040 ms
> A netstat run before I start the master up shows nothing running on port
> 9999 nor any connections to that port.
> A netstat after about the accumulo start shows about 16 connections in a
> TIME_WAIT state in the 35k-36k port range from the master. It also show an
> established state for 1 both both direction (36783) and inbound from port
> 9999 to port 47636 also from the master.
> It seems after this point anything that tries to connect to port 9999 goes
> into a TIME_WAIT and never does anything.
> I have checked all the permissions I can think of and everything seems to be
> HDFS is running correctly and jobs not associated with accumulo all see to
> be working.