Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Determining the cause of a tablet server failure


+
Mike Hugo 2013-02-27, 17:10
Copy link to this message
-
Re: Determining the cause of a tablet server failure
Check the .out and .err files. Out of Memory exceptions aren't caught by
log4j and instead go to those files.
On Wed, Feb 27, 2013 at 12:10 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:

> After running an ingest process via map reduce for about an hour or so,
> one of our tserver fails.  It happens pretty consistently, we're able to
> replicate it without too much difficulty.
>
> I'm looking in the $ACCUMULO_HOME/logs directory for clues as to why the
> tserver fails, but I'm not seeing much that points to a cause of the
> tserver going offline.   One minute it's there, the next it's offline.
>  There are some warnings about the swappiness as well as a large row that
> cannot be spit but other than that, not much else to go on.
>
> Is there anything that could help me figure out *why* the tserver died?
>  I'm guessing it's something in our client code or a config that's not
> correct on the server, but it'd be really nice to have a hint before we
> start randomly changing things to see what will fix it.
>
> Thanks,
>
> Mike
>
+
Eric Newton 2013-02-27, 17:15
+
Adam Fuchs 2013-02-27, 17:24
+
Mike Hugo 2013-02-27, 20:17
+
Adam Fuchs 2013-02-27, 20:27
+
John Vines 2013-02-27, 20:32
+
Christopher 2013-02-27, 22:46
+
Josh Elser 2013-02-27, 23:23