|
|
Jun Rao 2012-10-03, 16:59
Hi,
Will storing the ZK commit log on SSD improve ZK write latency? Does a ZK write wait until data is flushed to disk?
Thanks,
Jun
Patrick Hunt 2012-10-04, 01:13
My experience with SSDs and ZK has been discouraging. SSDs have some really terrible corner cases for latency. I've seen them take 40+ seconds (that's not a mistake - seconds) for fsync to complete. When this happened (every few hours) all of the sessions would timeout. See this article: http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/Patrick On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Hi, > > Will storing the ZK commit log on SSD improve ZK write latency? Does a ZK > write wait until data is flushed to disk? > > Thanks, > > Jun
Jun Rao 2012-10-04, 04:12
Patrick, Thanks for the info. Does each ZK write wait for log being flushed to disk? Jun On Wed, Oct 3, 2012 at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > My experience with SSDs and ZK has been discouraging. SSDs have some > really terrible corner cases for latency. I've seen them take 40+ > seconds (that's not a mistake - seconds) for fsync to complete. When > this happened (every few hours) all of the sessions would timeout. > > See this article: > http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/> > Patrick > > On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Will storing the ZK commit log on SSD improve ZK write latency? Does a ZK > > write wait until data is flushed to disk? > > > > Thanks, > > > > Jun >
Patrick Hunt 2012-10-04, 04:28
On Wed, Oct 3, 2012 at 9:12 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > Patrick, > > Thanks for the info. Does each ZK write wait for log being flushed to disk? > Yes (it's necessary for the guarantees we provide), although the servers do batching of writes to improve throughput. This doc gives some insight: http://zookeeper.apache.org/doc/r3.4.4/zookeeperOver.html#PerformancePatrick > On Wed, Oct 3, 2012 at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >> My experience with SSDs and ZK has been discouraging. SSDs have some >> really terrible corner cases for latency. I've seen them take 40+ >> seconds (that's not a mistake - seconds) for fsync to complete. When >> this happened (every few hours) all of the sessions would timeout. >> >> See this article: >> http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/>> >> Patrick >> >> On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > Hi, >> > >> > Will storing the ZK commit log on SSD improve ZK write latency? Does a ZK >> > write wait until data is flushed to disk? >> > >> > Thanks, >> > >> > Jun >>
Ted Dunning 2012-10-04, 04:28
Yes. And Patrick's experience is not unexpected. There is, however, a huge variation with different types of flash memory. The software driving the flash can also result in very different experience. The experiences that he alludes to are likely with a conventional SSD packaging of flash driven via the normal block device emulator. That can be substantially sub-optimal, depending on which vendor and configuration you use. On Thu, Oct 4, 2012 at 5:12 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Patrick, > > Thanks for the info. Does each ZK write wait for log being flushed to disk? > > Jun > > On Wed, Oct 3, 2012 at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > > > My experience with SSDs and ZK has been discouraging. SSDs have some > > really terrible corner cases for latency. I've seen them take 40+ > > seconds (that's not a mistake - seconds) for fsync to complete. When > > this happened (every few hours) all of the sessions would timeout. > > > > See this article: > > http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/> > > > Patrick > > > > On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > Will storing the ZK commit log on SSD improve ZK write latency? Does a > ZK > > > write wait until data is flushed to disk? > > > > > > Thanks, > > > > > > Jun > > >
Milind Parikh 2012-10-04, 04:32
It does seem so. "The most performance critical part of ZooKeeper is the transaction log. ZooKeeper syncs transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance. If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it should mitigate it." On Oct 3, 2012 9:13 PM, "Jun Rao" <[EMAIL PROTECTED]> wrote: > Patrick, > > Thanks for the info. Does each ZK write wait for log being flushed to disk? > > Jun > > On Wed, Oct 3, 2012 at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > > > My experience with SSDs and ZK has been discouraging. SSDs have some > > really terrible corner cases for latency. I've seen them take 40+ > > seconds (that's not a mistake - seconds) for fsync to complete. When > > this happened (every few hours) all of the sessions would timeout. > > > > See this article: > > http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/> > > > Patrick > > > > On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > Will storing the ZK commit log on SSD improve ZK write latency? Does a > ZK > > > write wait until data is flushed to disk? > > > > > > Thanks, > > > > > > Jun > > >
Andrew Purtell 2012-10-04, 04:39
Even so, I've seen in notes from attendees of Amazon's "DynamoDB For Developers" talks that Amazon says they found it necessary to work "extensively" with their SSD vendor (not stated publicly AFAIK) to engineer out latency spikes. I'd imagine they started with a strong vendor and not a low end device, but of course this is just speculation. On Thu, Oct 4, 2012 at 12:28 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > Yes. > > And Patrick's experience is not unexpected. There is, however, a huge > variation with different types of flash memory. The software driving the > flash can also result in very different experience. The experiences that > he alludes to are likely with a conventional SSD packaging of flash driven > via the normal block device emulator. That can be substantially > sub-optimal, depending on which vendor and configuration you use. > > On Thu, Oct 4, 2012 at 5:12 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> Patrick, >> >> Thanks for the info. Does each ZK write wait for log being flushed to disk? >> >> Jun >> >> On Wed, Oct 3, 2012 at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: >> >> > My experience with SSDs and ZK has been discouraging. SSDs have some >> > really terrible corner cases for latency. I've seen them take 40+ >> > seconds (that's not a mistake - seconds) for fsync to complete. When >> > this happened (every few hours) all of the sessions would timeout. >> > >> > See this article: >> > http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/>> > >> > Patrick >> > >> > On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > > Hi, >> > > >> > > Will storing the ZK commit log on SSD improve ZK write latency? Does a >> ZK >> > > write wait until data is flushed to disk? >> > > >> > > Thanks, >> > > >> > > Jun >> > >>
Ted Dunning 2012-10-04, 04:51
This is a good observation. On Thu, Oct 4, 2012 at 5:39 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Even so, I've seen in notes from attendees of Amazon's "DynamoDB For > Developers" talks that Amazon says they found it necessary to work > "extensively" with their SSD vendor (not stated publicly AFAIK) to > engineer out latency spikes. I'd imagine they started with a strong > vendor and not a low end device, but of course this is just > speculation. > > On Thu, Oct 4, 2012 at 12:28 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > Yes. > > > > And Patrick's experience is not unexpected. There is, however, a huge > > variation with different types of flash memory. The software driving the > > flash can also result in very different experience. The experiences that > > he alludes to are likely with a conventional SSD packaging of flash > driven > > via the normal block device emulator. That can be substantially > > sub-optimal, depending on which vendor and configuration you use. > > > > On Thu, Oct 4, 2012 at 5:12 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > >> Patrick, > >> > >> Thanks for the info. Does each ZK write wait for log being flushed to > disk? > >> > >> Jun > >> > >> On Wed, Oct 3, 2012 at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >> > >> > My experience with SSDs and ZK has been discouraging. SSDs have some > >> > really terrible corner cases for latency. I've seen them take 40+ > >> > seconds (that's not a mistake - seconds) for fsync to complete. When > >> > this happened (every few hours) all of the sessions would timeout. > >> > > >> > See this article: > >> > http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/> >> > > >> > Patrick > >> > > >> > On Wed, Oct 3, 2012 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > > Hi, > >> > > > >> > > Will storing the ZK commit log on SSD improve ZK write latency? > Does a > >> ZK > >> > > write wait until data is flushed to disk? > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > >> >
Ben Bangert 2012-10-04, 16:05
On Oct 3, 2012, at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > My experience with SSDs and ZK has been discouraging. SSDs have some > really terrible corner cases for latency. I've seen them take 40+ > seconds (that's not a mistake - seconds) for fsync to complete. When > this happened (every few hours) all of the sessions would timeout. > > See this article: > http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/It's worth noting that these tests are all on Enterprise SSD products, which have actually been lagging some of the advances the SSD controller folks have been making. I've had the same corner case on my own desktop SSD in the past with a huge write cliff, but this has gone away with some of the later heavily over-provisioned SSD's I've bought, such as this OWC 6G one I'm using. Course, these Enterprise folks are the same that prefer to scale vertically than horizontally using cheaper commodity hardware. The most useful factors to look at when choosing the SSD are the write amplification factor ( http://www.anandtech.com/show/5719/ocz-vertex-4-review-256gb-512gb), and how it handles the case when the drive runs out of free space (and thus has to garbage collect resulting in the write cliff). An over-provisioned drive can avoid the write-cliff because a chunk of the drive is reserved in advance to prevent it from ever getting completely full. See results here: Over-provisioned SSD: http://macperformanceguide.com/SSD-RealWorld-BeforeAfter-OWC.htmlNon-overprovisioned SSD: http://macperformanceguide.com/SSD-RealWorld-BeforeAfter-CrucialRealSSD.htmlIf you look through, there's some very worrying write-cliffs that are very apparent in SSD's that aren't over-provisioned, and they easily fail to perform as well as a RAID of platter drives. The other thing about the storagemojo article worth thinking about is whether you're actually going to buy a 12+ disk array for a faster ZK log... or are actually comparing a single platter disk vs. a single SSD. Cheers, Ben
Flavio Junqueira 2012-10-17, 20:42
Let me add one observation to this thread. If you check slide 69 of this presentation: https://cwiki.apache.org/confluence/download/attachments/24193445/keynote-hic-2011-web.pdfthe graph shows that not writing to disk (net only) does not actually improve write latency much, unless your disk write buffer is turned off. Unless there has been some important performance improvement I missed, it doesn't look like a faster device for the transaction log would be able to improve latency much at this point. Does it sound right? -Flavio On Oct 4, 2012, at 6:05 PM, Ben Bangert wrote: > On Oct 3, 2012, at 6:13 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote: > >> My experience with SSDs and ZK has been discouraging. SSDs have some >> really terrible corner cases for latency. I've seen them take 40+ >> seconds (that's not a mistake - seconds) for fsync to complete. When >> this happened (every few hours) all of the sessions would timeout. >> >> See this article: >> http://storagemojo.com/2012/06/07/the-ssd-write-cliff-in-real-life/> > It's worth noting that these tests are all on Enterprise SSD products, which have actually been lagging some of the advances the SSD controller folks have been making. I've had the same corner case on my own desktop SSD in the past with a huge write cliff, but this has gone away with some of the later heavily over-provisioned SSD's I've bought, such as this OWC 6G one I'm using. > > Course, these Enterprise folks are the same that prefer to scale vertically than horizontally using cheaper commodity hardware. The most useful factors to look at when choosing the SSD are the write amplification factor ( http://www.anandtech.com/show/5719/ocz-vertex-4-review-256gb-512gb), and how it handles the case when the drive runs out of free space (and thus has to garbage collect resulting in the write cliff). An over-provisioned drive can avoid the write-cliff because a chunk of the drive is reserved in advance to prevent it from ever getting completely full. See results here: > Over-provisioned SSD: > http://macperformanceguide.com/SSD-RealWorld-BeforeAfter-OWC.html> > Non-overprovisioned SSD: > http://macperformanceguide.com/SSD-RealWorld-BeforeAfter-CrucialRealSSD.html> > If you look through, there's some very worrying write-cliffs that are very apparent in SSD's that aren't over-provisioned, and they easily fail to perform as well as a RAID of platter drives. > > The other thing about the storagemojo article worth thinking about is whether you're actually going to buy a 12+ disk array for a faster ZK log... or are actually comparing a single platter disk vs. a single SSD. > > Cheers, > Ben
|
|