Yes, the agent-->collector path is HTTP.
This was done precisely to allow load balancers. I don't know how
tested that configuration is, though. I think most sites had Chukwa
itself do the load balancing by specifying multiple collectors.
There is a notion of end-to-end reliability; the so-called
asynchronous ack mechanism. It's off by default and hasn't been tried
much in production. See
the detailed design of it.
On Fri, Jul 29, 2011 at 11:04 AM, T. A. Smooth <[EMAIL PROTECTED]> wrote:
> Hello I am checking out Chukwa. I have a few questions I was hoping the mail
> list could answer :-)
> 1)Does Chukwa agents communicate to collectors over http? Or some other
> The agent configuration makes me believe that:
> 2) And the docs it seems an Agent will pick a collector at random and then
> use that collect until there is a problem in communicating with it. How do
> you think the agent/collector would act if they have a load balancer between
> them? For example, the agent configuration would have just one url
> http://collector-loadbalancer. example.com:8080/
> The load balancer would have 1 or more collectors behind it saving the
> chunks it receives to disk or hadoop.
> 3) Does chukwa have any “end-to-end” reliability features for message
> delivery? For example, a collector may receive the chunk from the agent but
> it may have a problem writing it to the data store. (ie. Disk space full,
> connection to hadoop down) . Will the agent be notified that the chunk was
> not processed for a certain reason and the agent is told to cache to disk
> the missed message?
> Thanks for the info!
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department