There are two knobs that, together, throttle the agent processes.
These are httpConnector.maxPostSize and httpConnector.minPostInterval
The maximum configured agent bandwidth is the ratio between those. I
would try reducing the min post interval. The defaults are, if I
remember right, something like 2 MB/ 5 seconds = 400 k/sec. You can
crank that down a long ways. Nothing should explode even if you set
it to 1 ms.
On Fri, Aug 13, 2010 at 9:11 AM, Eric Fiala <[EMAIL PROTECTED]> wrote:
> Hello all,
> We would like to bring our production Chukwa (0.3.0) infrastructure to the
> next level.
> Currently, we have 5 machines generating 400GB per day (80GB in single log,
> per machine).
> These are using chukwa-agent CharFileTailingAdaptorUTF8. Of
> note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to 4000000.
> We've left httpConnector.maxPostSize to default.
> The agents are sending to 3 chukwa-collectors which are simply gateways into
> HDFS (one also handles demux/processing - but this doesn't appear to be the
> wall... yet). The agents have all three collectors listed in their conf.
> We are hitting walls somewhere, the whole 400GB is worked all the way into
> our repos over the course of the day, but during peeks we are falling
> upwards of 1-2 hours behind between being written to the tailed log and
> hitting hdfs://chukwa/logs as a .chukwa.
> Further we have observed that hdfs://chukwa/logs in our setup does not fill
> faster than 2GB per 5 minute period. This is whether we use 2 chukwa
> collectors or 3. This is further discouragement once foreseeable growth
> takes us to over ~ 575GB per day.
> All the machines are definitely not load bound, have noticed that chukwa was
> built with low resource utilization in mind - one thought is if this could
> be tweaked we could probably get more data through quicker.
> We have toyed with changing default Xmx or like value but don't want to
> start turning too many knobs before consulting the experts, considering all
> the pieces involved it's probably wise. Scaling out is also an option, but
> I'm determined to squeeze x10 or more than current out of these multicore
> Any suggestions are welcome,
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department