|
Eric Tschetter
2012-04-03, 01:05
Jun Rao
2012-04-03, 15:04
Eric Tschetter
2012-04-03, 17:17
Jun Rao
2012-04-03, 18:30
Edward Smith
2012-04-12, 14:12
Jun Rao
2012-04-12, 16:18
Niek Sanders
2012-04-12, 16:33
Edward Smith
2012-04-12, 16:47
Niek Sanders
2012-04-12, 21:38
Edward Smith
2012-04-12, 21:53
|
-
Embedding a broker into a producer?Eric Tschetter 2012-04-03, 01:05
I'm setting up an HTTP endpoint that just takes a posted object and
shoves it into Kafka. I'm imagining this as basically an embedded broker in my producer and am wondering if there's a way to emit messages directly into the broker without actually setting up a Producer object? Or, is it just going to be simpler and more supported for me if I actually set up the separate objects and have them talk via whatever mechanism they end up talking via? --Eric +
Eric Tschetter 2012-04-03, 01:05
-
Re: Embedding a broker into a producer?Jun Rao 2012-04-03, 15:04
Eric,
Try using the Producer api. Internal apis are subject to change in the future and are not officially supported. Thanks, Jun On Mon, Apr 2, 2012 at 6:05 PM, Eric Tschetter <[EMAIL PROTECTED]> wrote: > I'm setting up an HTTP endpoint that just takes a posted object and > shoves it into Kafka. I'm imagining this as basically an embedded > broker in my producer and am wondering if there's a way to emit > messages directly into the broker without actually setting up a > Producer object? Or, is it just going to be simpler and more > supported for me if I actually set up the separate objects and have > them talk via whatever mechanism they end up talking via? > > --Eric > +
Jun Rao 2012-04-03, 15:04
-
Re: Embedding a broker into a producer?Eric Tschetter 2012-04-03, 17:17
Ok, I can do that (that's actually how our current stuff works as
well), I was just hoping to maybe remove the need to tell my producer to connect to localhost so that it can talk to some other part of the code running in the same process. Do you think you will ever have a Producer object implemented in terms of a KafkaServer object? Or, if that were to exist would you be willing to take on the maintenance of it as part of the public API? --Eric On Tue, Apr 3, 2012 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Eric, > > Try using the Producer api. Internal apis are subject to change in the > future and are not officially supported. > > Thanks, > > Jun > > On Mon, Apr 2, 2012 at 6:05 PM, Eric Tschetter <[EMAIL PROTECTED]> wrote: > >> I'm setting up an HTTP endpoint that just takes a posted object and >> shoves it into Kafka. I'm imagining this as basically an embedded >> broker in my producer and am wondering if there's a way to emit >> messages directly into the broker without actually setting up a >> Producer object? Or, is it just going to be simpler and more >> supported for me if I actually set up the separate objects and have >> them talk via whatever mechanism they end up talking via? >> >> --Eric >> +
Eric Tschetter 2012-04-03, 17:17
-
Re: Embedding a broker into a producer?Jun Rao 2012-04-03, 18:30
There is currently no plan for doing that. However, if you think this is a
useful feature, please create a jira so that we can track it. Thanks, Jun On Tue, Apr 3, 2012 at 10:17 AM, Eric Tschetter <[EMAIL PROTECTED]> wrote: > Ok, I can do that (that's actually how our current stuff works as > well), I was just hoping to maybe remove the need to tell my producer > to connect to localhost so that it can talk to some other part of the > code running in the same process. > > Do you think you will ever have a Producer object implemented in terms > of a KafkaServer object? Or, if that were to exist would you be > willing to take on the maintenance of it as part of the public API? > > --Eric > > > On Tue, Apr 3, 2012 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > Eric, > > > > Try using the Producer api. Internal apis are subject to change in the > > future and are not officially supported. > > > > Thanks, > > > > Jun > > > > On Mon, Apr 2, 2012 at 6:05 PM, Eric Tschetter <[EMAIL PROTECTED]> > wrote: > > > >> I'm setting up an HTTP endpoint that just takes a posted object and > >> shoves it into Kafka. I'm imagining this as basically an embedded > >> broker in my producer and am wondering if there's a way to emit > >> messages directly into the broker without actually setting up a > >> Producer object? Or, is it just going to be simpler and more > >> supported for me if I actually set up the separate objects and have > >> them talk via whatever mechanism they end up talking via? > >> > >> --Eric > >> > +
Jun Rao 2012-04-03, 18:30
-
Re: Embedding a broker into a producer?Edward Smith 2012-04-12, 14:12
Jun/Eric,
Just to add my two cents: I am starting a new project, and starting with KAFKA. Current architecture writes data to files on the producing hosts. Then a homebrew queuing system reads the files and passes them up to a consumer. Producer/Consumer pairing is all done manually, there is no load balancing. Fault tolerance is handled by having the producer send to 2 consumers and duplicating the processing, and then ignoring the duplicate results. My initial approach will be to run a KAFKA cluster and use a producer on the producing nodes to read the files from disk and send them up to the cluster, and then have consumers subscribe to the topics, etc. This seems like the 'normal' approach. However, we have a requirement to support HA. If I stick with the approach above, I have to worry about replication/mirroring the queues, which always gets sticky. We have to handle the case where a producer loses network connectivity, and so, must be able to queue locally at the producer, which, I believe either means put the KAFKA broker here or continue to use some 'homebrew' local queue. With brokers on the same node as producers, consumers only have to HA the results of their processing and I don't have to HA the queues. Any thoughts or feedback from the group is welcome. Ed On Tue, Apr 3, 2012 at 2:30 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > There is currently no plan for doing that. However, if you think this is a > useful feature, please create a jira so that we can track it. > > Thanks, > > Jun > > On Tue, Apr 3, 2012 at 10:17 AM, Eric Tschetter <[EMAIL PROTECTED]> wrote: > >> Ok, I can do that (that's actually how our current stuff works as >> well), I was just hoping to maybe remove the need to tell my producer >> to connect to localhost so that it can talk to some other part of the >> code running in the same process. >> >> Do you think you will ever have a Producer object implemented in terms >> of a KafkaServer object? Or, if that were to exist would you be >> willing to take on the maintenance of it as part of the public API? >> >> --Eric >> >> >> On Tue, Apr 3, 2012 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > Eric, >> > >> > Try using the Producer api. Internal apis are subject to change in the >> > future and are not officially supported. >> > >> > Thanks, >> > >> > Jun >> > >> > On Mon, Apr 2, 2012 at 6:05 PM, Eric Tschetter <[EMAIL PROTECTED]> >> wrote: >> > >> >> I'm setting up an HTTP endpoint that just takes a posted object and >> >> shoves it into Kafka. I'm imagining this as basically an embedded >> >> broker in my producer and am wondering if there's a way to emit >> >> messages directly into the broker without actually setting up a >> >> Producer object? Or, is it just going to be simpler and more >> >> supported for me if I actually set up the separate objects and have >> >> them talk via whatever mechanism they end up talking via? >> >> >> >> --Eric >> >> >> +
Edward Smith 2012-04-12, 14:12
-
Re: Embedding a broker into a producer?Jun Rao 2012-04-12, 16:18
Ed,
We also thought about have a local log in the producer in case the producer can't send data to the brokers. It's doable. However, it adds a bit of complexity in the code and for the operations (since now producers have to worry about storage and typically there are many more producers than brokers). Thanks, Jun On Thu, Apr 12, 2012 at 7:12 AM, Edward Smith <[EMAIL PROTECTED]>wrote: > Jun/Eric, > > Just to add my two cents: I am starting a new project, and starting > with KAFKA. Current architecture writes data to files on the > producing hosts. Then a homebrew queuing system reads the files and > passes them up to a consumer. Producer/Consumer pairing is all done > manually, there is no load balancing. Fault tolerance is handled by > having the producer send to 2 consumers and duplicating the > processing, and then ignoring the duplicate results. > > My initial approach will be to run a KAFKA cluster and use a > producer on the producing nodes to read the files from disk and send > them up to the cluster, and then have consumers subscribe to the > topics, etc. This seems like the 'normal' approach. > > However, we have a requirement to support HA. If I stick with the > approach above, I have to worry about replication/mirroring the > queues, which always gets sticky. We have to handle the case where a > producer loses network connectivity, and so, must be able to queue > locally at the producer, which, I believe either means put the KAFKA > broker here or continue to use some 'homebrew' local queue. With > brokers on the same node as producers, consumers only have to HA the > results of their processing and I don't have to HA the queues. > > Any thoughts or feedback from the group is welcome. > > Ed > > On Tue, Apr 3, 2012 at 2:30 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > There is currently no plan for doing that. However, if you think this is > a > > useful feature, please create a jira so that we can track it. > > > > Thanks, > > > > Jun > > > > On Tue, Apr 3, 2012 at 10:17 AM, Eric Tschetter <[EMAIL PROTECTED]> > wrote: > > > >> Ok, I can do that (that's actually how our current stuff works as > >> well), I was just hoping to maybe remove the need to tell my producer > >> to connect to localhost so that it can talk to some other part of the > >> code running in the same process. > >> > >> Do you think you will ever have a Producer object implemented in terms > >> of a KafkaServer object? Or, if that were to exist would you be > >> willing to take on the maintenance of it as part of the public API? > >> > >> --Eric > >> > >> > >> On Tue, Apr 3, 2012 at 8:04 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > Eric, > >> > > >> > Try using the Producer api. Internal apis are subject to change in the > >> > future and are not officially supported. > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Mon, Apr 2, 2012 at 6:05 PM, Eric Tschetter <[EMAIL PROTECTED]> > >> wrote: > >> > > >> >> I'm setting up an HTTP endpoint that just takes a posted object and > >> >> shoves it into Kafka. I'm imagining this as basically an embedded > >> >> broker in my producer and am wondering if there's a way to emit > >> >> messages directly into the broker without actually setting up a > >> >> Producer object? Or, is it just going to be simpler and more > >> >> supported for me if I actually set up the separate objects and have > >> >> them talk via whatever mechanism they end up talking via? > >> >> > >> >> --Eric > >> >> > >> > +
Jun Rao 2012-04-12, 16:18
-
Re: Embedding a broker into a producer?Niek Sanders 2012-04-12, 16:33
Dealing with network/broker outage on the producer side is also
something that I've been trying to solve. Having a hook for the producer to dump to a local file would probably be the simplest solution. In the event of a prolonged outage, this file could be replayed once availability is restored. The current approach I've been taking: 1) My bridge code between my data source and the Kafka producer writes everything to a local log files. When this bridge starts up, it generates a unique 8 character alphanumeric string. For each log entry it writes to the local file, it prefixes both the alphanumeric string and a log line number (0,1,2,3,....). The data already has timestamps coming with it. 2) In the event of a network outage or Kafka being unable to keep up with the producer, I simply drop the Kafka messages. I never allow my data source to be blocked because I'm waiting on Kafka producer/broker. 3) For given time ranges, my consumers track all the alphanumeric identifiers that they consumed and the maximum complete sequence number that they have seen. So I can manually go back to producers and replay any lost data. (Whether it was never sent because of network outage or if it died with a broker hardware failure). I basically go to the producer machine (which I track in the Kafka message body) and say: for time A to time B, I received data for these identifiers and max sequence numbers (najeh2wh, 12312), (ji3njdKL, 71). Replay anything that I'm missing. I use random identifier strings because it saves me from having to persist the number of log lines my producer has generated. (Robustness against producer failure). - Niek On Thu, Apr 12, 2012 at 7:12 AM, Edward Smith <[EMAIL PROTECTED]> wrote: > Jun/Eric, > > [snip] > > However, we have a requirement to support HA. If I stick with the > approach above, I have to worry about replication/mirroring the > queues, which always gets sticky. We have to handle the case where a > producer loses network connectivity, and so, must be able to queue > locally at the producer, which, I believe either means put the KAFKA > broker here or continue to use some 'homebrew' local queue. With > brokers on the same node as producers, consumers only have to HA the > results of their processing and I don't have to HA the queues. > > Any thoughts or feedback from the group is welcome. > > Ed > +
Niek Sanders 2012-04-12, 16:33
-
Re: Embedding a broker into a producer?Edward Smith 2012-04-12, 16:47
Niek,
Thanks for sharing your architecture. We are in a similar boat, as our current datastream is written to files first, and then the kafka producer can read/transmit those. What do you see as the downside to running a KAFKA Broker and letting it write your files locally? I'm new to kafka, so just exploring ideas here: producer-side broker downsides: 1. Heavier memory/processing footprint than just a producer producer-side broker upsides: 1. eliminates the middle man, you essentially have peer-to-peer operation between producers and consumers, with ZK as the coordinator. For me, this is big, since I don't have to worry about High Availability (HA) for the brokers. 2. eliminates duplicating data on disk at both the producer and the broker. 3. Data has been demultiplexed into it's topics when it is on disk at the producer/broker. This means that I can purge data based on per-topic policies (Our data arrives multiplexed and has to be split into topics, we also run into out-of-storage during a network outage). In the research I've been doing, this is the model proposed by the 0mq (zeromq) folks, I think. Its just that all of the wiring is already written in kafka. Ed On Thu, Apr 12, 2012 at 12:33 PM, Niek Sanders <[EMAIL PROTECTED]> wrote: > Dealing with network/broker outage on the producer side is also > something that I've been trying to solve. > > Having a hook for the producer to dump to a local file would probably > be the simplest solution. In the event of a prolonged outage, this > file could be replayed once availability is restored. > > The current approach I've been taking: > 1) My bridge code between my data source and the Kafka producer writes > everything to a local log files. When this bridge starts up, it > generates a unique 8 character alphanumeric string. For each log > entry it writes to the local file, it prefixes both the alphanumeric > string and a log line number (0,1,2,3,....). The data already has > timestamps coming with it. > 2) In the event of a network outage or Kafka being unable to keep up > with the producer, I simply drop the Kafka messages. I never allow my > data source to be blocked because I'm waiting on Kafka > producer/broker. > 3) For given time ranges, my consumers track all the alphanumeric > identifiers that they consumed and the maximum complete sequence > number that they have seen. > > So I can manually go back to producers and replay any lost data. > (Whether it was never sent because of network outage or if it died > with a broker hardware failure). > > I basically go to the producer machine (which I track in the Kafka > message body) and say: for time A to time B, I received data for these > identifiers and max sequence numbers (najeh2wh, 12312), (ji3njdKL, > 71). Replay anything that I'm missing. > > I use random identifier strings because it saves me from having to > persist the number of log lines my producer has generated. > (Robustness against producer failure). > > - Niek > > > > > > > > On Thu, Apr 12, 2012 at 7:12 AM, Edward Smith <[EMAIL PROTECTED]> wrote: >> Jun/Eric, >> >> [snip] >> >> However, we have a requirement to support HA. If I stick with the >> approach above, I have to worry about replication/mirroring the >> queues, which always gets sticky. We have to handle the case where a >> producer loses network connectivity, and so, must be able to queue >> locally at the producer, which, I believe either means put the KAFKA >> broker here or continue to use some 'homebrew' local queue. With >> brokers on the same node as producers, consumers only have to HA the >> results of their processing and I don't have to HA the queues. >> >> Any thoughts or feedback from the group is welcome. >> >> Ed >> +
Edward Smith 2012-04-12, 16:47
-
Re: Embedding a broker into a producer?Niek Sanders 2012-04-12, 21:38
> What do you see as the downside to running a KAFKA Broker and letting
> it write your files locally? > Here is every downside I can come up with. Some are trivial, but I've tried to play devil's advocate. 1) Additional memory/processing footprint, as you mentioned. 2) Additional network usage by Kafka consumers hitting producer box. 3) Loss of either producer elasticity or message retention. One of the awesome features of Kafka is being able to hold on a long history of messages and replay as needed. But if the producers machines hold my brokers, I can no longer scale down the number of producer machines as the system load drops--a loss of elasticity. This ties to another group discussion about decommissioning brokers. 4) Security... some producers live on web-accessible machines. The less open ports accepting incoming connections, the better. Not a huge issue with good firewall rules, but still something to ponder. 5) Loss of data duplication has downsides too. Having multiple copies of data does violate DRY, but it can also add robustness. If the hard-disk on either the producer or the brokers dies, you still have the data lying around. Since the data is not getting changed on either the broker or any producer log files, you shouldn't have syncing issues normally associated with DRY violations. - Niek +
Niek Sanders 2012-04-12, 21:38
-
Re: Embedding a broker into a producer?Edward Smith 2012-04-12, 21:53
Niek,
Thanks for the response. I agree with your assessments. It's a beautiful thing that I don't have to commit to this decision early in the build process. I can start it one way and just switch to the other, as it is all a matter of configuration and not code. I'll let you know if we learn anything interesting along the way. Ed On Thu, Apr 12, 2012 at 5:38 PM, Niek Sanders <[EMAIL PROTECTED]> wrote: >> What do you see as the downside to running a KAFKA Broker and letting >> it write your files locally? >> > > Here is every downside I can come up with. Some are trivial, but I've > tried to play devil's advocate. > > 1) Additional memory/processing footprint, as you mentioned. > > 2) Additional network usage by Kafka consumers hitting producer box. > > 3) Loss of either producer elasticity or message retention. One of > the awesome features of Kafka is being able to hold on a long history > of messages and replay as needed. But if the producers machines hold > my brokers, I can no longer scale down the number of producer machines > as the system load drops--a loss of elasticity. This ties to another > group discussion about decommissioning brokers. > > 4) Security... some producers live on web-accessible machines. The > less open ports accepting incoming connections, the better. Not a > huge issue with good firewall rules, but still something to ponder. > > 5) Loss of data duplication has downsides too. Having multiple copies > of data does violate DRY, but it can also add robustness. If the > hard-disk on either the producer or the brokers dies, you still have > the data lying around. Since the data is not getting changed on > either the broker or any producer log files, you shouldn't have > syncing issues normally associated with DRY violations. > > > - Niek +
Edward Smith 2012-04-12, 21:53
|