Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Timeouts and ping handling


+
Manosiz Bhattacharyya 2012-01-18, 21:26
+
Patrick Hunt 2012-01-18, 22:34
+
Patrick Hunt 2012-01-18, 22:41
+
Ted Dunning 2012-01-18, 22:47
+
Manosiz Bhattacharyya 2012-01-18, 22:47
+
Patrick Hunt 2012-01-18, 22:53
+
Manosiz Bhattacharyya 2012-01-19, 00:47
+
Ted Dunning 2012-01-19, 00:54
+
Manosiz Bhattacharyya 2012-01-19, 01:47
+
Ted Dunning 2012-01-19, 01:15
+
Manosiz Bhattacharyya 2012-01-19, 01:41
+
Patrick Hunt 2012-01-19, 01:49
+
Manosiz Bhattacharyya 2012-01-19, 02:18
+
Ted Dunning 2012-01-19, 06:18
+
Manosiz Bhattacharyya 2012-01-19, 17:31
+
Patrick Hunt 2012-01-19, 18:09
+
Manosiz Bhattacharyya 2012-01-19, 18:48
Copy link to this message
-
Re: Timeouts and ping handling
ZK does pretty much entirely sequential I/O.

One thing that it does which might be very, very bad for SSD is that it
pre-allocates disk extents in the log by writing a bunch of zeros.  This is
to avoid directory updates as the log is written, but it doubles the load
on the SSD.

On Thu, Jan 19, 2012 at 5:31 PM, Manosiz Bhattacharyya
<[EMAIL PROTECTED]>wrote:

> I do not think that there is a problem with the queue size. I guess the
> problem is more with latency when the Fusion I/O goes in for a GC. We are
> enabling stats on the Zookeeper and the fusion I/O to be more precise. Does
> Zookeeper typically do only sequential I/O, or does it do some random too.
> We could then move the logs to a disk.
>
> Thanks,
> Manosiz.
>
> On Wed, Jan 18, 2012 at 10:18 PM, Ted Dunning <[EMAIL PROTECTED]>
> wrote:
>
> > If you aren't pushing much data through ZK, there is almost no way that
> the
> > request queue can fill up without the log or snapshot disks being slow.
> >  See what happens if you put the log into a real disk or (heaven help us)
> > onto a tmpfs partition.
> >
> > On Thu, Jan 19, 2012 at 2:18 AM, Manosiz Bhattacharyya
> > <[EMAIL PROTECTED]>wrote:
> >
> > > I will do as you mention.
> > >
> > > We are using the async API's throughout. Also we do not write too much
> > data
> > > into Zookeeper. We just use it for leadership elections and health
> > > monitoring, which is why we see the timeouts typically on idle
> zookeeper
> > > connections.
> > >
> > > The reason why we want the sessions to be alive is because of the
> > > leadership election algorithm that we use from the zookeeper recipe. So
> > if
> > > a connection is broken for the leader node, the ephemeral node that
> > > guaranteed its leadership is lost, and reconnecting will create a new
> > node
> > > which does not guarantee leadership. We then have to re-elect a new
> > leader
> > > - which requires significant work. The bigger the timeout, bigger is
> the
> > > time the cluster stays without a master for a particular service, as
> the
> > > old master cannot keep on working once it has known its session is gone
> > and
> > > with it, its ephemeral node. As we are trying to have highly available
> > > service (not internet scale, but at the scale of a storage system with
> ms
> > > latencies typically), we thought about reducing the timeout, but
> keeping
> > > the session open. Also note the node that typically is the master does
> > not
> > > write too often into zookeeper.
> > >
> > > Thanks,
> > > Manosiz.
> > >
> > > On Wed, Jan 18, 2012 at 5:49 PM, Patrick Hunt <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > On Wed, Jan 18, 2012 at 4:47 PM, Manosiz Bhattacharyya
> > > > <[EMAIL PROTECTED]> wrote:
> > > > > Thanks Patrick for your answer,
> > > >
> > > > No problem.
> > > >
> > > > > Actually we are in a virtualized environment, we have a FIO disk
> for
> > > > > transactional logs. It does have some latency sometimes during FIO
> > > > garbage
> > > > > collection. We know this could be the potential issue, but was
> trying
> > > to
> > > > > workaround that.
> > > >
> > > > Ah, I see. I saw something very similar to this recently with SSDs
> > > > used for the datadir. The fdatasync latency was sometimes > 10
> > > > seconds. I suspect it happened as a result of disk GC activity.
> > > >
> > > > I was able to identify the problem by running something like this:
> > > >
> > > > sudo strace -r -T -f -p 8066 -e trace=fsync,fdatasync -o trace.txt
> > > >
> > > > and then graphing the results (log scale). You should try running
> this
> > > > against your servers to confirm that it is indeed the problem.
> > > >
> > > > > We were trying to qualify the requests into two types - either HB's
> > or
> > > > > normal requests. Isn't it better to reject normal requests if the
> > queue
> > > > > size is full to say a certain threshold, but keep the session
> alive.
> > > That
> > > > > way the flow control can be achieved with the users session
> retrying
> > > the
+
Manosiz Bhattacharyya 2012-01-19, 18:49
+
Patrick Hunt 2012-01-19, 19:31
+
Manosiz Bhattacharyya 2012-01-19, 19:47
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB