Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> HDFS read/write data throttling


+
lohit 2013-11-11, 18:59
+
Adam Muise 2013-11-11, 19:27
+
lohit 2013-11-11, 19:47
+
Haosong Huang 2013-11-12, 02:16
Copy link to this message
-
Re: HDFS read/write data throttling
Hey Lohit,

This is an interesting topic, and something I actually worked on in grad
school before coming to Cloudera. It'd help if you could outline some of
your usecases and how per-FileSystem throttling would help. For what I was
doing, it made more sense to throttle on the DN side since you have a
better view over all the I/O happening on the system, and you have
knowledge of different volumes so you can set limits per-disk. This still
isn't 100% reliable though since normally a portion of each disk is used
for MR scratch space, which the DN doesn't have control over. I tried
playing with thread I/O priorities here, but didn't see much improvement.
Maybe the newer cgroups stuff can help out.

I'm sure per-FileSystem throttling will have some benefits (and probably be
easier than some DN-side implementation) but again, it'd help to better
understand the problem you are trying to solve.

Best,
Andrew
On Mon, Nov 11, 2013 at 6:16 PM, Haosong Huang <[EMAIL PROTECTED]> wrote:

> Hi, lohit. There is a Class named
> ThrottledInputStream<
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
> >
>  in hadoop-distcp, you could check it out and find more details.
>
> In addition to this, I am working on this and try to achieve resources
> control(include CPU, Network, Disk IO) in JVM. But my implementation is
> depends on cgroup, which only could run in Linux. I would push my
> library(java-cgroup) to github in the next several months. If you are
> interested at it, give my any advices and help me improve it please. :-)
>
>
> On Tue, Nov 12, 2013 at 3:47 AM, lohit <[EMAIL PROTECTED]> wrote:
>
> > Hi Adam,
> >
> > Thanks for the reply. The changes I was referring was in FileSystem.java
> > layer which should not affect HDFS Replication/NameNode operations.
> > To give better idea this would affect clients something like this
> >
> > Configuration conf = new Configuration();
> > conf.setInt("read.bandwitdh.mbpersec", 20); // 20MB/s
> > FileSystem fs = FileSystem.get(conf);
> >
> > FSDataInputStream fis = fs.open("/path/to/file.xt");
> > fis.read(); // <-- This would be max of 20MB/s
> >
> >
> >
> >
> > 2013/11/11 Adam Muise <[EMAIL PROTECTED]>
> >
> > > See https://issues.apache.org/jira/browse/HDFS-3475
> > >
> > > Please note that this has met with many unexpected impacts on workload.
> > Be
> > > careful and be mindful of your Datanode memory and network capacity.
> > >
> > >
> > >
> > >
> > > On Mon, Nov 11, 2013 at 1:59 PM, lohit <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hello Devs,
> > > >
> > > > Wanted to reach out and see if anyone has thought about ability to
> > > throttle
> > > > data transfer within HDFS. One option we have been thinking is to
> > > throttle
> > > > on a per FileSystem basis, similar to Statistics in FileSystem. This
> > > would
> > > > mean anyone with handle to HDFS/Hftp will be throttled globally
> within
> > > JVM.
> > > > Right value to come up for this would be based on type of hardware we
> > use
> > > > and how many tasks/clients we allow.
> > > >
> > > > On the other hand doing something like this at FileSystem layer would
> > > mean
> > > > many other tasks such as Job jar copy, DistributedCache copy and any
> > > hidden
> > > > data movement would also be throttled. We wanted to know if anyone
> has
> > > had
> > > > such requirement on their clusters in the past and what was the
> > thinking
> > > > around it. Appreciate your inputs/comments
> > > >
> > > > --
> > > > Have a Nice Day!
> > > > Lohit
> > > >
> > >
> > >
> > >
> > > --
> > >    * Adam Muise *       Solutions Engineer
> > > ------------------------------
> > >
> > >     Phone:        416-417-4037
> > >   Email:      [EMAIL PROTECTED]
> > >   Website:   http://www.hortonworks.com/
> > >
> > >       * Follow Us: *
> > > <
> > >
> >
> http://facebook.com/hortonworks/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
+
lohit 2013-11-12, 04:24
+
Steve Loughran 2013-11-12, 09:38
+
Andrew Wang 2013-11-13, 06:27
+
Steve Loughran 2013-11-13, 10:54
+
Andrew Wang 2013-11-18, 18:25
+
Jay Vyas 2013-11-18, 18:46
+
Andrew Wang 2013-11-18, 21:25
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB