Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # dev - hsync is too slower than hflush


+
haosdent 2013-08-25, 05:11
+
Andrew Wang 2013-08-25, 23:07
+
haosdent 2013-08-26, 02:44
+
Andrew Wang 2013-08-26, 03:18
Copy link to this message
-
Re: hsync is too slower than hflush
haosdent 2013-08-26, 03:21
haha, thank you very much, I get it now.

--
Best Regards,
Haosong Huang
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Monday, August 26, 2013 at 11:18 AM, Andrew Wang wrote:

> Ah, I forgot the checksum fsync, so two seeks. Even with 4k writes, 50ms
> still feels in the right ballpark. Best case it's ~20ms, still way slower
> than hflush.
>
> It's also worth asking if there's other dirty data waiting for writeback,
> since I believe it can also get written out on an fsync.
>
> hflush doesn't durably write to disk, so you're still in danger of losing
> data if there's a cluster-wide power outage. Because HDFS writes to two
> different racks, hflush still protects you from single-rack outages. Most
> people think this is good enough (I believe HBase by default runs with just
> hflush), but if you *really* want to be sure, pay the cost of hsync and do
> durable writes.
>
>
> On Sun, Aug 25, 2013 at 7:44 PM, haosdent <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
>
> > In fact, I just write 4k in every hsync. Datenode would write checksum
> > file and data file when I hsync data to datanode. Each of them would spent
> > nearly 25ms, so a hsync call would spent nearly 50ms. But hflush is very
> > fast, which spent both 1ms in write checksum and data. If a hsync would
> > spent 50ms, what meanings we use it? Or my test way is wrong?
> >
> > --
> > Best Regards,
> > Haosong Huang
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> > On Monday, August 26, 2013 at 7:07 AM, Andrew Wang wrote:
> >
> > > 50ms is believable. hsync makes each DN call fsync and wait for acks, so
> > > you'd expect at least a disk seek time (~10ms) with some extra time
> > > depending on how much unsync'd data is being written.
> > >
> > > So, just as some back of the envelope math, assuming a disk that can
> > write
> > > at 100MB/s:
> > >
> > > 50ms - 10ms seek = 40ms writing time
> > > 100 MB/s * 40ms = 4MB
> > >
> > > If you're hsync'ing every 4MB, 50ms would be exactly what I'd expect.
> > >
> > > Best,
> > > Andrew
> > >
> > >
> > > On Sat, Aug 24, 2013 at 10:11 PM, haosdent <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) (mailto:
> > [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]))> wrote:
> > >
> > > > Hi, all. Hadoop support hsync which would call fsync of system after
> > > > 2.0.2. I have tested the performance of hsync() and hflush() again and
> > > > again, but I found that the hsync call() everytime would spent nearly
> > > >
> > >
> > >
> >
> > 50ms
> > > > while the hflush call() just spent 2ms. In this slide(
> > >
> >
> > http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usagePage18), the author mentions that hsync() is 2x slower than hflush(). So,
> > > > is anything wrong? Thank you very much and looking forward to your
> > >
> >
> > help.
> > > >
> > > > --
> > > > Best Regards,
> > > > Haosong Huang
> > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > >
> > >
> >
> >
>
>
>
+
lei liu 2013-08-26, 14:30
+
Andrew Wang 2013-08-26, 17:44