Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> hsync is too slower than hflush


+
haosdent 2013-08-25, 05:11
+
Andrew Wang 2013-08-25, 23:07
+
haosdent 2013-08-26, 02:44
+
Andrew Wang 2013-08-26, 03:18
+
haosdent 2013-08-26, 03:21
Copy link to this message
-
Re: hsync is too slower than hflush
Hi all,

DataNode sequential write file, so I think the disk seek time should be
very small.Why is disk seek time 10ms? I think that is too long. Whether we
can optimize the linux system configuration, reduce disk seek time.
2013/8/26 haosdent <[EMAIL PROTECTED]>

> haha, thank you very much, I get it now.
>
> --
> Best Regards,
> Haosong Huang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Monday, August 26, 2013 at 11:18 AM, Andrew Wang wrote:
>
> > Ah, I forgot the checksum fsync, so two seeks. Even with 4k writes, 50ms
> > still feels in the right ballpark. Best case it's ~20ms, still way slower
> > than hflush.
> >
> > It's also worth asking if there's other dirty data waiting for writeback,
> > since I believe it can also get written out on an fsync.
> >
> > hflush doesn't durably write to disk, so you're still in danger of losing
> > data if there's a cluster-wide power outage. Because HDFS writes to two
> > different racks, hflush still protects you from single-rack outages. Most
> > people think this is good enough (I believe HBase by default runs with
> just
> > hflush), but if you *really* want to be sure, pay the cost of hsync and
> do
> > durable writes.
> >
> >
> > On Sun, Aug 25, 2013 at 7:44 PM, haosdent <[EMAIL PROTECTED] (mailto:
> [EMAIL PROTECTED])> wrote:
> >
> > > In fact, I just write 4k in every hsync. Datenode would write checksum
> > > file and data file when I hsync data to datanode. Each of them would
> spent
> > > nearly 25ms, so a hsync call would spent nearly 50ms. But hflush is
> very
> > > fast, which spent both 1ms in write checksum and data. If a hsync would
> > > spent 50ms, what meanings we use it? Or my test way is wrong?
> > >
> > > --
> > > Best Regards,
> > > Haosong Huang
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > >
> > >
> > > On Monday, August 26, 2013 at 7:07 AM, Andrew Wang wrote:
> > >
> > > > 50ms is believable. hsync makes each DN call fsync and wait for
> acks, so
> > > > you'd expect at least a disk seek time (~10ms) with some extra time
> > > > depending on how much unsync'd data is being written.
> > > >
> > > > So, just as some back of the envelope math, assuming a disk that can
> > > write
> > > > at 100MB/s:
> > > >
> > > > 50ms - 10ms seek = 40ms writing time
> > > > 100 MB/s * 40ms = 4MB
> > > >
> > > > If you're hsync'ing every 4MB, 50ms would be exactly what I'd expect.
> > > >
> > > > Best,
> > > > Andrew
> > > >
> > > >
> > > > On Sat, Aug 24, 2013 at 10:11 PM, haosdent <[EMAIL PROTECTED](mailto:
> [EMAIL PROTECTED]) (mailto:
> > > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))> wrote:
> > > >
> > > > > Hi, all. Hadoop support hsync which would call fsync of system
> after
> > > > > 2.0.2. I have tested the performance of hsync() and hflush() again
> and
> > > > > again, but I found that the hsync call() everytime would spent
> nearly
> > > > >
> > > >
> > > >
> > >
> > > 50ms
> > > > > while the hflush call() just spent 2ms. In this slide(
> > > >
> > >
> > >
> http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usagePage18),
> the author mentions that hsync() is 2x slower than hflush(). So,
> > > > > is anything wrong? Thank you very much and looking forward to your
> > > >
> > >
> > > help.
> > > > >
> > > > > --
> > > > > Best Regards,
> > > > > Haosong Huang
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > >
> > > >
> > >
> > >
> >
> >
> >
>
>
>
+
Andrew Wang 2013-08-26, 17:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB