Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa >> mail # user >> Supercharging Chukwa


+
Eric Fiala 2010-08-13, 16:11
Copy link to this message
-
Re: Supercharging Chukwa
There are two knobs that, together, throttle the agent processes.

These are httpConnector.maxPostSize and httpConnector.minPostInterval

The maximum configured agent bandwidth is the ratio between those.  I
would try reducing the min post interval.  The defaults are, if I
remember right, something like 2 MB/ 5 seconds = 400 k/sec.   You can
crank that down a long ways.  Nothing should explode even if you set
it to 1 ms.

--Ari

On Fri, Aug 13, 2010 at 9:11 AM, Eric Fiala <[EMAIL PROTECTED]> wrote:
> Hello all,
> We would like to bring our production Chukwa (0.3.0) infrastructure to the
> next level.
> Currently, we have 5 machines generating 400GB per day (80GB in single log,
> per machine).
> These are using chukwa-agent CharFileTailingAdaptorUTF8.  Of
> note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to 4000000.
>  We've left httpConnector.maxPostSize to default.
> The agents are sending to 3 chukwa-collectors which are simply gateways into
> HDFS (one also handles demux/processing - but this doesn't appear to be the
> wall... yet).  The agents have all three collectors listed in their conf.
> We are hitting walls somewhere, the whole 400GB is worked all the way into
> our repos over the course of the day, but during peeks we are falling
> upwards of 1-2 hours behind between being written to the tailed log and
> hitting hdfs://chukwa/logs as a .chukwa.
> Further we have observed that hdfs://chukwa/logs in our setup does not fill
> faster than 2GB per 5 minute period.  This is whether we use 2 chukwa
> collectors or 3.  This is further discouragement once foreseeable growth
> takes us to over ~ 575GB per day.
> All the machines are definitely not load bound, have noticed that chukwa was
> built with low resource utilization in mind - one thought is if this could
> be tweaked we could probably get more data through quicker.
> We have toyed with changing default Xmx or like value but don't want to
> start turning too many knobs before consulting the experts, considering all
> the pieces involved it's probably wise.  Scaling out is also an option, but
> I'm determined to squeeze x10 or more than current out of these multicore
> machines.
> Any suggestions are welcome,
> Thanks.
> EF

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
+
Eric Yang 2010-08-16, 16:47
+
Eric Fiala 2010-08-17, 00:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB