Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - HDFS read/write data throttling

Copy link to this message
Re: HDFS read/write data throttling
Jay Vyas 2013-11-18, 18:46
Where is the jira for this?

Sent from my iPhone

> On Nov 18, 2013, at 1:25 PM, Andrew Wang <[EMAIL PROTECTED]> wrote:
> Thanks for asking, here's a link:
> http://www.umbrant.com/papers/socc12-cake.pdf
> I don't think there's a recording of my talk unfortunately.
> I'll also copy my comments over to the JIRA, though I'd like to not
> distract too much from what Lohit's trying to do.
> On Wed, Nov 13, 2013 at 2:54 AM, Steve Loughran <[EMAIL PROTECTED]>wrote:
>> this is interesting -I've moved my comments over to the JIRA and it would
>> be good for yours to go there too.
>> is there a URL for your paper?
>>> On 13 November 2013 06:27, Andrew Wang <[EMAIL PROTECTED]> wrote:
>>> Hey Steve,
>>> My research project (Cake, published at SoCC '12) was trying to provide
>>> SLAs for mixed workloads of latency-sensitive and throughput-bound
>>> applications, e.g. HBase running alongside MR. This was challenging
>> because
>>> seeks are a real killer. Basically, we had to strongly limit MR I/O to
>> keep
>>> worst-case seek latency down, and did so by putting schedulers on the RPC
>>> queues in HBase and HDFS to restrict queuing in the OS and disk where we
>>> lacked preemption.
>>> Regarding citations of note, most academics consider throughput-sharing
>> to
>>> be a solved problem. It's not dissimilar from normal time slicing, you
>> try
>>> to ensure fairness over some coarse timescale. I think cgroups [1] and
>>> ioprio_set [2] essentially provide this.
>>> Mixing throughput and latency though is difficult, and my conclusion is
>>> that there isn't a really great solution for spinning disks besides
>>> physical isolation. As we all know, you can get either IOPS or bandwidth,
>>> but not both, and it's not a linear tradeoff between the two. If you're
>>> interested in this though, I can dig up some related work from my Cake
>>> paper.
>>> However, since it seems that we're more concerned with throughput-bound
>>> apps, we might be okay just using cgroups and ioprio_set to do
>>> time-slicing. I actually hacked up some code a while ago which passed a
>>> client-provided priority byte to the DN, which used it to set the I/O
>>> priority of the handling DataXceiver accordingly. This isn't the most
>>> outlandish idea, since we've put QoS fields in our RPC protocol for
>>> instance; this would just be another byte. Short-circuit reads are
>> outside
>>> this paradigm, but then you can use cgroup controls instead.
>>> My casual conversations with Googlers indicate that there isn't any
>> special
>>> Borg/Omega sauce either, just that they heavily prioritize DFS I/O over
>>> non-DFS. Maybe that's another approach: if we can separate block
>> management
>>> in HDFS, MR tasks could just write their output to a raw HDFS block, thus
>>> bringing a lot of I/O back into the fold of "datanode as I/O manager"
>> for a
>>> machine.
>>> Overall, I strongly agree with you that it's important to first define
>> what
>>> our goals are regarding I/O QoS. The general case is a tarpit, so it'd be
>>> good to carve off useful things that can be done now (like Lohit's
>>> direction of per-stream/FS throughput throttling with trusted clients)
>> and
>>> then carefully grow the scope as we find more usecases we can confidently
>>> solve.
>>> Best,
>>> Andrew
>>> [1] cgroups blkio controller
>>> https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
>>> [2] ioprio_set http://man7.org/linux/man-pages/man2/ioprio_set.2.html
>>> On Tue, Nov 12, 2013 at 1:38 AM, Steve Loughran <[EMAIL PROTECTED]
>>>> wrote:
>>>> I've looked at it a bit within the context of YARN.
>>>> YARN containers are where this would be ideal, as then you'd be able to
>>>> request IO capacity as well as CPU and RAM. For that to work, the
>>>> throttling would have to be outside the App, as you are trying to limit
>>>> code whether or not it wants to be, and because you probably (*) want