Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Performance during node failure


Copy link to this message
-
Re: Performance during node failure
>
> I currently have zookeeper running on all 7 data nodes
>

If you ever grow your cluster, you shouldn't keep running more zookeepers.
Adding zookeepers slows down zookeeper writes.

> with the batchwriters running on the name node. Basically, I was getting a
> number of the following:
>
> client session timed out …
>
> opening socket connection
>
> socket connection established
>
> session establishment complete
>
> …
>
> client session timed out …
>
> repeat
>

>
> I would also occasionally get
>
> session expired for /accumulo/fe7…
>
> as well as
>
> Zookeper.KeeperException$Connectionloss
>
> Exception: KeeperErrorCode = Connectionloss
>
> for /accumulo/f37…/tables/3b/state
>
> at accumulo.core.zookeeper.ZooCache$2.run
>
> accumulo.core.zookeeper.ZooCache.retry
>
> accumulo.core.zookeeper.ZooCach.get
>
> core.clientimpl.tables.getTableState
>
> core.clientimpl.multiTableBatchWriter.getBatchWriter
>
> myIngestorProcess.run
>

>
> Does anyone know if this is an Accumulo problem, a Zookeeper problem, or
> something else (network overly busy, etc.)?
>
>
>
This happens when:

1) a jvm swaps out
2) a jvm does stop-the-world garbage collection
3) there is a network disconnect/interruption

By far, the biggest reason for a lost zookeeper session is that either the
tablet server or the zookeeper process have been pushed into swap.

Make sure that swappiness is set to zero, that you have ample memory for
all your processes, and set the env variable MALLOC_ARENA_MAX to 1:

export MALLOC_ARENA_MAX=1