Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> HDFS read/write data throttling


+
lohit 2013-11-11, 18:59
+
Adam Muise 2013-11-11, 19:27
+
lohit 2013-11-11, 19:47
+
Haosong Huang 2013-11-12, 02:16
+
Andrew Wang 2013-11-12, 03:44
Copy link to this message
-
Re: HDFS read/write data throttling
2013/11/11 Andrew Wang <[EMAIL PROTECTED]>

> Hey Lohit,
>
> This is an interesting topic, and something I actually worked on in grad
> school before coming to Cloudera. It'd help if you could outline some of
> your usecases and how per-FileSystem throttling would help. For what I was
> doing, it made more sense to throttle on the DN side since you have a
> better view over all the I/O happening on the system, and you have
> knowledge of different volumes so you can set limits per-disk. This still
> isn't 100% reliable though since normally a portion of each disk is used
> for MR scratch space, which the DN doesn't have control over. I tried
> playing with thread I/O priorities here, but didn't see much improvement.
> Maybe the newer cgroups stuff can help out.
>

Thanks. Yes, we also thought about having something on DataNode. This would
also mean one could easily throttle client who access from outside the
cluster, for example distcp or hftp copies. Clients need not worry about
throttle configs and each cluster can control how much much throughput can
be achieved. We do want to have something like this.

>
> I'm sure per-FileSystem throttling will have some benefits (and probably be
> easier than some DN-side implementation) but again, it'd help to better
> understand the problem you are trying to solve.
>

One idea was flexibility for client to override and have value they can
set. For on trusted cluster we could allow clients to go beyond default
value for some usecases. Alternatively we also thought about having default
value and max value where clients could change default, but not go beyond
default. Another problem with DN side config is having different values for
different clients and easily changing those for selective clients.

As, Haosong also suggested we could wrap FSDataOutputStream/FSDataInput
stream with ThrottleInputStream. But we might have to be careful of any
code which uses FileSystem APIs and accidentally throttling itself. (like
reducer copy,  distributed cache and such...)

> Best,
> Andrew
>
>
> On Mon, Nov 11, 2013 at 6:16 PM, Haosong Huang <[EMAIL PROTECTED]> wrote:
>
> > Hi, lohit. There is a Class named
> > ThrottledInputStream<
> >
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
> > >
> >  in hadoop-distcp, you could check it out and find more details.
> >
> > In addition to this, I am working on this and try to achieve resources
> > control(include CPU, Network, Disk IO) in JVM. But my implementation is
> > depends on cgroup, which only could run in Linux. I would push my
> > library(java-cgroup) to github in the next several months. If you are
> > interested at it, give my any advices and help me improve it please. :-)
> >
> >
> > On Tue, Nov 12, 2013 at 3:47 AM, lohit <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi Adam,
> > >
> > > Thanks for the reply. The changes I was referring was in
> FileSystem.java
> > > layer which should not affect HDFS Replication/NameNode operations.
> > > To give better idea this would affect clients something like this
> > >
> > > Configuration conf = new Configuration();
> > > conf.setInt("read.bandwitdh.mbpersec", 20); // 20MB/s
> > > FileSystem fs = FileSystem.get(conf);
> > >
> > > FSDataInputStream fis = fs.open("/path/to/file.xt");
> > > fis.read(); // <-- This would be max of 20MB/s
> > >
> > >
> > >
> > >
> > > 2013/11/11 Adam Muise <[EMAIL PROTECTED]>
> > >
> > > > See https://issues.apache.org/jira/browse/HDFS-3475
> > > >
> > > > Please note that this has met with many unexpected impacts on
> workload.
> > > Be
> > > > careful and be mindful of your Datanode memory and network capacity.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Nov 11, 2013 at 1:59 PM, lohit <[EMAIL PROTECTED]>
> > > wrote:
> > > >
> > > > > Hello Devs,
> > > > >
> > > > > Wanted to reach out and see if anyone has thought about ability to
> > > > throttle

Have a Nice Day!
Lohit
+
Steve Loughran 2013-11-12, 09:38
+
Andrew Wang 2013-11-13, 06:27
+
Steve Loughran 2013-11-13, 10:54
+
Andrew Wang 2013-11-18, 18:25
+
Jay Vyas 2013-11-18, 18:46
+
Andrew Wang 2013-11-18, 21:25