|
|
Eric Fiala 2010-08-13, 16:11
Hello all, We would like to bring our production Chukwa (0.3.0) infrastructure to the next level.
Currently, we have 5 machines generating 400GB per day (80GB in single log, per machine). These are using chukwa-agent CharFileTailingAdaptorUTF8. Of note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to 4000000. We've left httpConnector.maxPostSize to default.
The agents are sending to 3 chukwa-collectors which are simply gateways into HDFS (one also handles demux/processing - but this doesn't appear to be the wall... yet). The agents have all three collectors listed in their conf.
We are hitting walls somewhere, the whole 400GB is worked all the way into our repos over the course of the day, but during peeks we are falling upwards of 1-2 hours behind between being written to the tailed log and hitting hdfs://chukwa/logs as a .chukwa. Further we have observed that hdfs://chukwa/logs in our setup does not fill faster than 2GB per 5 minute period. This is whether we use 2 chukwa collectors or 3. This is further discouragement once foreseeable growth takes us to over ~ 575GB per day.
All the machines are definitely not load bound, have noticed that chukwa was built with low resource utilization in mind - one thought is if this could be tweaked we could probably get more data through quicker.
We have toyed with changing default Xmx or like value but don't want to start turning too many knobs before consulting the experts, considering all the pieces involved it's probably wise. Scaling out is also an option, but I'm determined to squeeze x10 or more than current out of these multicore machines.
Any suggestions are welcome, Thanks. EF
-
Re: Supercharging Chukwa
Ariel Rabkin 2010-08-13, 17:26
There are two knobs that, together, throttle the agent processes.
These are httpConnector.maxPostSize and httpConnector.minPostInterval
The maximum configured agent bandwidth is the ratio between those. I would try reducing the min post interval. The defaults are, if I remember right, something like 2 MB/ 5 seconds = 400 k/sec. You can crank that down a long ways. Nothing should explode even if you set it to 1 ms.
--Ari
On Fri, Aug 13, 2010 at 9:11 AM, Eric Fiala <[EMAIL PROTECTED]> wrote: > Hello all, > We would like to bring our production Chukwa (0.3.0) infrastructure to the > next level. > Currently, we have 5 machines generating 400GB per day (80GB in single log, > per machine). > These are using chukwa-agent CharFileTailingAdaptorUTF8. Of > note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to 4000000. > We've left httpConnector.maxPostSize to default. > The agents are sending to 3 chukwa-collectors which are simply gateways into > HDFS (one also handles demux/processing - but this doesn't appear to be the > wall... yet). The agents have all three collectors listed in their conf. > We are hitting walls somewhere, the whole 400GB is worked all the way into > our repos over the course of the day, but during peeks we are falling > upwards of 1-2 hours behind between being written to the tailed log and > hitting hdfs://chukwa/logs as a .chukwa. > Further we have observed that hdfs://chukwa/logs in our setup does not fill > faster than 2GB per 5 minute period. This is whether we use 2 chukwa > collectors or 3. This is further discouragement once foreseeable growth > takes us to over ~ 575GB per day. > All the machines are definitely not load bound, have noticed that chukwa was > built with low resource utilization in mind - one thought is if this could > be tweaked we could probably get more data through quicker. > We have toyed with changing default Xmx or like value but don't want to > start turning too many knobs before consulting the experts, considering all > the pieces involved it's probably wise. Scaling out is also an option, but > I'm determined to squeeze x10 or more than current out of these multicore > machines. > Any suggestions are welcome, > Thanks. > EF
-- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department
-
Re: Supercharging Chukwa
Eric Yang 2010-08-16, 16:47
I think we forget to document those two parameters in the template config. I added CHUKWA-509 jira for documenting this.
Regards, Eric
On 8/13/10 10:26 AM, "Ariel Rabkin" <[EMAIL PROTECTED]> wrote:
There are two knobs that, together, throttle the agent processes.
These are httpConnector.maxPostSize and httpConnector.minPostInterval
The maximum configured agent bandwidth is the ratio between those. I would try reducing the min post interval. The defaults are, if I remember right, something like 2 MB/ 5 seconds = 400 k/sec. You can crank that down a long ways. Nothing should explode even if you set it to 1 ms.
--Ari
On Fri, Aug 13, 2010 at 9:11 AM, Eric Fiala <[EMAIL PROTECTED]> wrote: > Hello all, > We would like to bring our production Chukwa (0.3.0) infrastructure to the > next level. > Currently, we have 5 machines generating 400GB per day (80GB in single log, > per machine). > These are using chukwa-agent CharFileTailingAdaptorUTF8. Of > note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to 4000000. > We've left httpConnector.maxPostSize to default. > The agents are sending to 3 chukwa-collectors which are simply gateways into > HDFS (one also handles demux/processing - but this doesn't appear to be the > wall... yet). The agents have all three collectors listed in their conf. > We are hitting walls somewhere, the whole 400GB is worked all the way into > our repos over the course of the day, but during peeks we are falling > upwards of 1-2 hours behind between being written to the tailed log and > hitting hdfs://chukwa/logs as a .chukwa. > Further we have observed that hdfs://chukwa/logs in our setup does not fill > faster than 2GB per 5 minute period. This is whether we use 2 chukwa > collectors or 3. This is further discouragement once foreseeable growth > takes us to over ~ 575GB per day. > All the machines are definitely not load bound, have noticed that chukwa was > built with low resource utilization in mind - one thought is if this could > be tweaked we could probably get more data through quicker. > We have toyed with changing default Xmx or like value but don't want to > start turning too many knobs before consulting the experts, considering all > the pieces involved it's probably wise. Scaling out is also an option, but > I'm determined to squeeze x10 or more than current out of these multicore > machines. > Any suggestions are welcome, > Thanks. > EF
-- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department
-
Re: Supercharging Chukwa
Eric Fiala 2010-08-17, 00:17
Ari, Thanks for your insight - it appears to have thrust us over the latest barrier! After lowering httpConnector.minPostInterval to 500 we have achieved much higher sustained throughput rates - data is flowing to the agents, through the collectors and landing in HDFS within seconds - observed 3.2GB collected in peek 5 minute period today (well beyond our prior 2GB wall!). I can confirm defaults are 5000ms for httpConnector.minPostInterval and 2MB for httpConnector.maxPostSize - it's good to know that I should be able to decrease minPostInterval even more should the need arise.
Once again, many thanks. EF
On 13 August 2010 11:26, Ariel Rabkin <[EMAIL PROTECTED]> wrote:
> There are two knobs that, together, throttle the agent processes. > > These are httpConnector.maxPostSize and httpConnector.minPostInterval > > The maximum configured agent bandwidth is the ratio between those. I > would try reducing the min post interval. The defaults are, if I > remember right, something like 2 MB/ 5 seconds = 400 k/sec. You can > crank that down a long ways. Nothing should explode even if you set > it to 1 ms. > > --Ari > > On Fri, Aug 13, 2010 at 9:11 AM, Eric Fiala <[EMAIL PROTECTED]> wrote: > > Hello all, > > We would like to bring our production Chukwa (0.3.0) infrastructure to > the > > next level. > > Currently, we have 5 machines generating 400GB per day (80GB in single > log, > > per machine). > > These are using chukwa-agent CharFileTailingAdaptorUTF8. Of > > note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to > 4000000. > > We've left httpConnector.maxPostSize to default. > > The agents are sending to 3 chukwa-collectors which are simply gateways > into > > HDFS (one also handles demux/processing - but this doesn't appear to be > the > > wall... yet). The agents have all three collectors listed in their conf. > > We are hitting walls somewhere, the whole 400GB is worked all the way > into > > our repos over the course of the day, but during peeks we are falling > > upwards of 1-2 hours behind between being written to the tailed log and > > hitting hdfs://chukwa/logs as a .chukwa. > > Further we have observed that hdfs://chukwa/logs in our setup does not > fill > > faster than 2GB per 5 minute period. This is whether we use 2 chukwa > > collectors or 3. This is further discouragement once foreseeable growth > > takes us to over ~ 575GB per day. > > All the machines are definitely not load bound, have noticed that chukwa > was > > built with low resource utilization in mind - one thought is if this > could > > be tweaked we could probably get more data through quicker. > > We have toyed with changing default Xmx or like value but don't want to > > start turning too many knobs before consulting the experts, considering > all > > the pieces involved it's probably wise. Scaling out is also an option, > but > > I'm determined to squeeze x10 or more than current out of these multicore > > machines. > > Any suggestions are welcome, > > Thanks. > > EF > > > > -- > Ari Rabkin [EMAIL PROTECTED] > UC Berkeley Computer Science Department >
|
|