|
Mark
2011-11-05, 01:52
Jun Rao
2011-11-05, 02:09
Mark
2011-11-05, 18:49
Jun Rao
2011-11-05, 18:56
Neha Narkhede
2011-11-05, 18:56
Jay Kreps
2011-11-05, 20:19
Jay Kreps
2011-11-05, 20:29
Mark
2011-11-05, 21:03
Jay Kreps
2011-11-05, 21:28
Tim Lossen
2011-11-06, 06:49
Mark
2011-11-06, 17:05
Tim Lossen
2011-11-07, 08:27
Jun Rao
2011-11-07, 16:25
Taylor Gautier
2011-11-07, 16:30
Chris Burroughs
2011-11-10, 21:17
Chris Burroughs
2011-11-15, 01:12
|
-
ZookeeperMark 2011-11-05, 01:52
I just noticed that there is an option to not use Zookeeper and instead
one can use a static list of brokers (#9 on http://incubator.apache.org/kafka/quickstart.html). Do i put this list in server.properties? It doesn't seem like you save much either way as you have to either a) list out all the nodes in the zookeeper quorum in zookeeper.properties b) list out static brokers in server.properties. What are the benefits of using ZooKeeper over a static list? Can someone also explain how Kafka uses ZooKeeper? Thanks
-
Re: ZookeeperJun Rao 2011-11-05, 02:09
broker.list is used in the producer property file. One caveat is that the
broker.list approach doesn't do healthcheck. Which means that if a broker goes down, the client could still try to send messages to it. At LinkedIn, we rely on a load balancer to do healthcheck for us. The zk-based producer, on the other hand, does health check. You can find out more details about our ZK design in our design page in the website or the paper in https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations. Jun On Fri, Nov 4, 2011 at 6:52 PM, Mark <[EMAIL PROTECTED]> wrote: > I just noticed that there is an option to not use Zookeeper and instead > one can use a static list of brokers (#9 on http://incubator.apache.org/** > kafka/quickstart.html <http://incubator.apache.org/kafka/quickstart.html>). > Do i put this list in server.properties? > > It doesn't seem like you save much either way as you have to either > a) list out all the nodes in the zookeeper quorum in zookeeper.properties > b) list out static brokers in server.properties. > > What are the benefits of using ZooKeeper over a static list? Can someone > also explain how Kafka uses ZooKeeper? > > Thanks > >
-
Re: ZookeeperMark 2011-11-05, 18:49
Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer
instead of ZooKeeper or do you use it in conjunction with ZooKeeper? Thanks On 11/4/11 7:09 PM, Jun Rao wrote: > broker.list is used in the producer property file. One caveat is that the > broker.list approach doesn't do healthcheck. Which means that if a broker > goes down, the client could still try to send messages to it. At LinkedIn, > we rely on a load balancer to do healthcheck for us. The zk-based producer, > on the other hand, does health check. > > You can find out more details about our ZK design in our design page in the > website or the paper in > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations. > > Jun > > On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> wrote: > >> I just noticed that there is an option to not use Zookeeper and instead >> one can use a static list of brokers (#9 on http://incubator.apache.org/** >> kafka/quickstart.html<http://incubator.apache.org/kafka/quickstart.html>). >> Do i put this list in server.properties? >> >> It doesn't seem like you save much either way as you have to either >> a) list out all the nodes in the zookeeper quorum in zookeeper.properties >> b) list out static brokers in server.properties. >> >> What are the benefits of using ZooKeeper over a static list? Can someone >> also explain how Kafka uses ZooKeeper? >> >> Thanks >> >>
-
Re: ZookeeperJun Rao 2011-11-05, 18:56
Mark,
At LinkedIn, we use both ZK-based and broker list based producer. For the latter, the broker list has only 1 entry which points to a VIP in a load balancer. Thanks, Jun On Sat, Nov 5, 2011 at 11:49 AM, Mark <[EMAIL PROTECTED]> wrote: > Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer > instead of ZooKeeper or do you use it in conjunction with ZooKeeper? > > Thanks > > > On 11/4/11 7:09 PM, Jun Rao wrote: > >> broker.list is used in the producer property file. One caveat is that the >> broker.list approach doesn't do healthcheck. Which means that if a broker >> goes down, the client could still try to send messages to it. At LinkedIn, >> we rely on a load balancer to do healthcheck for us. The zk-based >> producer, >> on the other hand, does health check. >> >> You can find out more details about our ZK design in our design page in >> the >> website or the paper in >> https://cwiki.apache.org/**confluence/display/KAFKA/** >> Kafka+papers+and+presentations<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations> >> **. >> >> Jun >> >> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]**> wrote: >> >> I just noticed that there is an option to not use Zookeeper and instead >>> one can use a static list of brokers (#9 on >>> http://incubator.apache.org/** >>> kafka/quickstart.html<http://**incubator.apache.org/kafka/** >>> quickstart.html <http://incubator.apache.org/kafka/quickstart.html>>). >>> >>> Do i put this list in server.properties? >>> >>> It doesn't seem like you save much either way as you have to either >>> a) list out all the nodes in the zookeeper quorum in >>> zookeeper.properties >>> b) list out static brokers in server.properties. >>> >>> What are the benefits of using ZooKeeper over a static list? Can someone >>> also explain how Kafka uses ZooKeeper? >>> >>> Thanks >>> >>> >>>
-
Re: ZookeeperNeha Narkhede 2011-11-05, 18:56
Mark,
Most publishers at LinkedIn use a hardware load balancer approach. These are configured to do a TCP healthcheck that monitors if the kafka port on a broker is working. If it is, then requests are forwarded to the broker. Some publishers though are using the software load balancer based on zookeeper. Those applications want to do some key based partitioning of data. Thanks, Neha On Sat, Nov 5, 2011 at 11:49 AM, Mark <[EMAIL PROTECTED]> wrote: > Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer > instead of ZooKeeper or do you use it in conjunction with ZooKeeper? > > Thanks > > On 11/4/11 7:09 PM, Jun Rao wrote: >> >> broker.list is used in the producer property file. One caveat is that the >> broker.list approach doesn't do healthcheck. Which means that if a broker >> goes down, the client could still try to send messages to it. At LinkedIn, >> we rely on a load balancer to do healthcheck for us. The zk-based >> producer, >> on the other hand, does health check. >> >> You can find out more details about our ZK design in our design page in >> the >> website or the paper in >> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations. >> >> Jun >> >> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> wrote: >> >>> I just noticed that there is an option to not use Zookeeper and instead >>> one can use a static list of brokers (#9 on >>> http://incubator.apache.org/** >>> >>> kafka/quickstart.html<http://incubator.apache.org/kafka/quickstart.html>). >>> Do i put this list in server.properties? >>> >>> It doesn't seem like you save much either way as you have to either >>> a) list out all the nodes in the zookeeper quorum in >>> zookeeper.properties >>> b) list out static brokers in server.properties. >>> >>> What are the benefits of using ZooKeeper over a static list? Can someone >>> also explain how Kafka uses ZooKeeper? >>> >>> Thanks >>> >>> >
-
Re: ZookeeperJay Kreps 2011-11-05, 20:19
The motivation here is is that literally every production process at
LinkedIn sends messages to Kafka as part of either user tracking or operational monitoring or both. We are wary of adding that many zk connections and watches, so we run this first tier through a simple L2 load balancer that just randomly balances connections over brokers. The good part about this is that we can do zookeeper upgrades without redeploying all the production apps to upgrade their zk jar. As Neha says, the zk producer is used for key-based partitioning by the smaller number of producers who need that. -Jay On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > Mark, > > Most publishers at LinkedIn use a hardware load balancer approach. > These are configured to do a TCP healthcheck that monitors if the > kafka port on a broker is working. If it is, then requests are > forwarded to the broker. Some publishers though are using the software > load balancer based on zookeeper. Those applications want to do some > key based partitioning of data. > > Thanks, > Neha > > On Sat, Nov 5, 2011 at 11:49 AM, Mark <[EMAIL PROTECTED]> wrote: > > Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer > > instead of ZooKeeper or do you use it in conjunction with ZooKeeper? > > > > Thanks > > > > On 11/4/11 7:09 PM, Jun Rao wrote: > >> > >> broker.list is used in the producer property file. One caveat is that > the > >> broker.list approach doesn't do healthcheck. Which means that if a > broker > >> goes down, the client could still try to send messages to it. At > LinkedIn, > >> we rely on a load balancer to do healthcheck for us. The zk-based > >> producer, > >> on the other hand, does health check. > >> > >> You can find out more details about our ZK design in our design page in > >> the > >> website or the paper in > >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations > . > >> > >> Jun > >> > >> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> wrote: > >> > >>> I just noticed that there is an option to not use Zookeeper and instead > >>> one can use a static list of brokers (#9 on > >>> http://incubator.apache.org/** > >>> > >>> kafka/quickstart.html< > http://incubator.apache.org/kafka/quickstart.html>). > >>> Do i put this list in server.properties? > >>> > >>> It doesn't seem like you save much either way as you have to either > >>> a) list out all the nodes in the zookeeper quorum in > >>> zookeeper.properties > >>> b) list out static brokers in server.properties. > >>> > >>> What are the benefits of using ZooKeeper over a static list? Can > someone > >>> also explain how Kafka uses ZooKeeper? > >>> > >>> Thanks > >>> > >>> > > >
-
Re: ZookeeperJay Kreps 2011-11-05, 20:29
It is also worth mentioning that this is just for producers, consumers
always use zookeeper for load balancing and co-ordination. Logically this makes sense--partitioning production is trivial if you don't care about semantics of key=>partition assignment, but partitioning consumption is more complex because you need to divide up the partitions amongst the set of all consumers exactly. -jay On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps <[EMAIL PROTECTED]> wrote: > The motivation here is is that literally every production process at > LinkedIn sends messages to Kafka as part of either user tracking or > operational monitoring or both. We are wary of adding that many zk > connections and watches, so we run this first tier through a simple L2 load > balancer that just randomly balances connections over brokers. The good > part about this is that we can do zookeeper upgrades without redeploying > all the production apps to upgrade their zk jar. > > As Neha says, the zk producer is used for key-based partitioning by the > smaller number of producers who need that. > > -Jay > > > On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > >> Mark, >> >> Most publishers at LinkedIn use a hardware load balancer approach. >> These are configured to do a TCP healthcheck that monitors if the >> kafka port on a broker is working. If it is, then requests are >> forwarded to the broker. Some publishers though are using the software >> load balancer based on zookeeper. Those applications want to do some >> key based partitioning of data. >> >> Thanks, >> Neha >> >> On Sat, Nov 5, 2011 at 11:49 AM, Mark <[EMAIL PROTECTED]> wrote: >> > Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >> > instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >> > >> > Thanks >> > >> > On 11/4/11 7:09 PM, Jun Rao wrote: >> >> >> >> broker.list is used in the producer property file. One caveat is that >> the >> >> broker.list approach doesn't do healthcheck. Which means that if a >> broker >> >> goes down, the client could still try to send messages to it. At >> LinkedIn, >> >> we rely on a load balancer to do healthcheck for us. The zk-based >> >> producer, >> >> on the other hand, does health check. >> >> >> >> You can find out more details about our ZK design in our design page in >> >> the >> >> website or the paper in >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >> . >> >> >> >> Jun >> >> >> >> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> >> wrote: >> >> >> >>> I just noticed that there is an option to not use Zookeeper and >> instead >> >>> one can use a static list of brokers (#9 on >> >>> http://incubator.apache.org/** >> >>> >> >>> kafka/quickstart.html< >> http://incubator.apache.org/kafka/quickstart.html>). >> >>> Do i put this list in server.properties? >> >>> >> >>> It doesn't seem like you save much either way as you have to either >> >>> a) list out all the nodes in the zookeeper quorum in >> >>> zookeeper.properties >> >>> b) list out static brokers in server.properties. >> >>> >> >>> What are the benefits of using ZooKeeper over a static list? Can >> someone >> >>> also explain how Kafka uses ZooKeeper? >> >>> >> >>> Thanks >> >>> >> >>> >> > >> > >
-
Re: ZookeeperMark 2011-11-05, 21:03
Ok, so no matter what ZooKeeper is still required when using Kafka. One
just has the option to either loadbalance producer => broker connections via ZooKeeper or a Loadbalancer. Is that correct? If so, I think I finally got it :) On 11/5/11 1:29 PM, Jay Kreps wrote: > It is also worth mentioning that this is just for producers, consumers > always use zookeeper for load balancing and co-ordination. Logically this > makes sense--partitioning production is trivial if you don't care about > semantics of key=>partition assignment, but partitioning consumption is > more complex because you need to divide up the partitions amongst the set > of all consumers exactly. > > -jay > > On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> wrote: > >> The motivation here is is that literally every production process at >> LinkedIn sends messages to Kafka as part of either user tracking or >> operational monitoring or both. We are wary of adding that many zk >> connections and watches, so we run this first tier through a simple L2 load >> balancer that just randomly balances connections over brokers. The good >> part about this is that we can do zookeeper upgrades without redeploying >> all the production apps to upgrade their zk jar. >> >> As Neha says, the zk producer is used for key-based partitioning by the >> smaller number of producers who need that. >> >> -Jay >> >> >> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede<[EMAIL PROTECTED]>wrote: >> >>> Mark, >>> >>> Most publishers at LinkedIn use a hardware load balancer approach. >>> These are configured to do a TCP healthcheck that monitors if the >>> kafka port on a broker is working. If it is, then requests are >>> forwarded to the broker. Some publishers though are using the software >>> load balancer based on zookeeper. Those applications want to do some >>> key based partitioning of data. >>> >>> Thanks, >>> Neha >>> >>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<[EMAIL PROTECTED]> wrote: >>>> Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >>>> instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >>>> >>>> Thanks >>>> >>>> On 11/4/11 7:09 PM, Jun Rao wrote: >>>>> broker.list is used in the producer property file. One caveat is that >>> the >>>>> broker.list approach doesn't do healthcheck. Which means that if a >>> broker >>>>> goes down, the client could still try to send messages to it. At >>> LinkedIn, >>>>> we rely on a load balancer to do healthcheck for us. The zk-based >>>>> producer, >>>>> on the other hand, does health check. >>>>> >>>>> You can find out more details about our ZK design in our design page in >>>>> the >>>>> website or the paper in >>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >>> . >>>>> Jun >>>>> >>>>> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> >>> wrote: >>>>>> I just noticed that there is an option to not use Zookeeper and >>> instead >>>>>> one can use a static list of brokers (#9 on >>>>>> http://incubator.apache.org/** >>>>>> >>>>>> kafka/quickstart.html< >>> http://incubator.apache.org/kafka/quickstart.html>). >>>>>> Do i put this list in server.properties? >>>>>> >>>>>> It doesn't seem like you save much either way as you have to either >>>>>> a) list out all the nodes in the zookeeper quorum in >>>>>> zookeeper.properties >>>>>> b) list out static brokers in server.properties. >>>>>> >>>>>> What are the benefits of using ZooKeeper over a static list? Can >>> someone >>>>>> also explain how Kafka uses ZooKeeper? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>
-
Re: ZookeeperJay Kreps 2011-11-05, 21:28
That's correct. The option is primarily for testing purposes.
Sent from my iPhone On Nov 5, 2011, at 2:03 PM, Mark <[EMAIL PROTECTED]> wrote: > Ok, so no matter what ZooKeeper is still required when using Kafka. One just has the option to either loadbalance producer => broker connections via ZooKeeper or a Loadbalancer. > > Is that correct? If so, I think I finally got it :) > > On 11/5/11 1:29 PM, Jay Kreps wrote: >> It is also worth mentioning that this is just for producers, consumers >> always use zookeeper for load balancing and co-ordination. Logically this >> makes sense--partitioning production is trivial if you don't care about >> semantics of key=>partition assignment, but partitioning consumption is >> more complex because you need to divide up the partitions amongst the set >> of all consumers exactly. >> >> -jay >> >> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> wrote: >> >>> The motivation here is is that literally every production process at >>> LinkedIn sends messages to Kafka as part of either user tracking or >>> operational monitoring or both. We are wary of adding that many zk >>> connections and watches, so we run this first tier through a simple L2 load >>> balancer that just randomly balances connections over brokers. The good >>> part about this is that we can do zookeeper upgrades without redeploying >>> all the production apps to upgrade their zk jar. >>> >>> As Neha says, the zk producer is used for key-based partitioning by the >>> smaller number of producers who need that. >>> >>> -Jay >>> >>> >>> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede<[EMAIL PROTECTED]>wrote: >>> >>>> Mark, >>>> >>>> Most publishers at LinkedIn use a hardware load balancer approach. >>>> These are configured to do a TCP healthcheck that monitors if the >>>> kafka port on a broker is working. If it is, then requests are >>>> forwarded to the broker. Some publishers though are using the software >>>> load balancer based on zookeeper. Those applications want to do some >>>> key based partitioning of data. >>>> >>>> Thanks, >>>> Neha >>>> >>>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<[EMAIL PROTECTED]> wrote: >>>>> Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >>>>> instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >>>>> >>>>> Thanks >>>>> >>>>> On 11/4/11 7:09 PM, Jun Rao wrote: >>>>>> broker.list is used in the producer property file. One caveat is that >>>> the >>>>>> broker.list approach doesn't do healthcheck. Which means that if a >>>> broker >>>>>> goes down, the client could still try to send messages to it. At >>>> LinkedIn, >>>>>> we rely on a load balancer to do healthcheck for us. The zk-based >>>>>> producer, >>>>>> on the other hand, does health check. >>>>>> >>>>>> You can find out more details about our ZK design in our design page in >>>>>> the >>>>>> website or the paper in >>>>>> >>>>>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >>>> . >>>>>> Jun >>>>>> >>>>>> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> >>>> wrote: >>>>>>> I just noticed that there is an option to not use Zookeeper and >>>> instead >>>>>>> one can use a static list of brokers (#9 on >>>>>>> http://incubator.apache.org/** >>>>>>> >>>>>>> kafka/quickstart.html< >>>> http://incubator.apache.org/kafka/quickstart.html>). >>>>>>> Do i put this list in server.properties? >>>>>>> >>>>>>> It doesn't seem like you save much either way as you have to either >>>>>>> a) list out all the nodes in the zookeeper quorum in >>>>>>> zookeeper.properties >>>>>>> b) list out static brokers in server.properties. >>>>>>> >>>>>>> What are the benefits of using ZooKeeper over a static list? Can >>>> someone >>>>>>> also explain how Kafka uses ZooKeeper? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>
-
Re: ZookeeperTim Lossen 2011-11-06, 06:49
we are using kafka entirely without zookeeper, and it is working
fine so far: single kafka broker, ruby consumers without coordination. tim On 2011-11-05, at 22:03 , Mark wrote: > Ok, so no matter what ZooKeeper is still required when using Kafka. One just has the option to either loadbalance producer => broker connections via ZooKeeper or a Loadbalancer. > > Is that correct? If so, I think I finally got it :) > > On 11/5/11 1:29 PM, Jay Kreps wrote: >> It is also worth mentioning that this is just for producers, consumers >> always use zookeeper for load balancing and co-ordination. Logically this >> makes sense--partitioning production is trivial if you don't care about >> semantics of key=>partition assignment, but partitioning consumption is >> more complex because you need to divide up the partitions amongst the set >> of all consumers exactly. >> >> -jay >> >> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> wrote: >> >>> The motivation here is is that literally every production process at >>> LinkedIn sends messages to Kafka as part of either user tracking or >>> operational monitoring or both. We are wary of adding that many zk >>> connections and watches, so we run this first tier through a simple L2 load >>> balancer that just randomly balances connections over brokers. The good >>> part about this is that we can do zookeeper upgrades without redeploying >>> all the production apps to upgrade their zk jar. >>> >>> As Neha says, the zk producer is used for key-based partitioning by the >>> smaller number of producers who need that. >>> >>> -Jay >>> >>> >>> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede<[EMAIL PROTECTED]>wrote: >>> >>>> Mark, >>>> >>>> Most publishers at LinkedIn use a hardware load balancer approach. >>>> These are configured to do a TCP healthcheck that monitors if the >>>> kafka port on a broker is working. If it is, then requests are >>>> forwarded to the broker. Some publishers though are using the software >>>> load balancer based on zookeeper. Those applications want to do some >>>> key based partitioning of data. >>>> >>>> Thanks, >>>> Neha >>>> >>>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<[EMAIL PROTECTED]> wrote: >>>>> Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >>>>> instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >>>>> >>>>> Thanks >>>>> >>>>> On 11/4/11 7:09 PM, Jun Rao wrote: >>>>>> broker.list is used in the producer property file. One caveat is that >>>> the >>>>>> broker.list approach doesn't do healthcheck. Which means that if a >>>> broker >>>>>> goes down, the client could still try to send messages to it. At >>>> LinkedIn, >>>>>> we rely on a load balancer to do healthcheck for us. The zk-based >>>>>> producer, >>>>>> on the other hand, does health check. >>>>>> >>>>>> You can find out more details about our ZK design in our design page in >>>>>> the >>>>>> website or the paper in >>>>>> >>>>>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >>>> . >>>>>> Jun >>>>>> >>>>>> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> >>>> wrote: >>>>>>> I just noticed that there is an option to not use Zookeeper and >>>> instead >>>>>>> one can use a static list of brokers (#9 on >>>>>>> http://incubator.apache.org/** >>>>>>> >>>>>>> kafka/quickstart.html< >>>> http://incubator.apache.org/kafka/quickstart.html>). >>>>>>> Do i put this list in server.properties? >>>>>>> >>>>>>> It doesn't seem like you save much either way as you have to either >>>>>>> a) list out all the nodes in the zookeeper quorum in >>>>>>> zookeeper.properties >>>>>>> b) list out static brokers in server.properties. >>>>>>> >>>>>>> What are the benefits of using ZooKeeper over a static list? Can >>>> someone >>>>>>> also explain how Kafka uses ZooKeeper? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>> -- http://tim.lossen.de
-
Re: ZookeeperMark 2011-11-06, 17:05
Tim,
Would you mind explaining how you use Kafka? Basically the general overview of the messages/events you are capturing and how you go about processing them. We will also be using kafka-rb so I'm particularly interested in how others are using it. - M On 11/5/11 11:49 PM, Tim Lossen wrote: > we are using kafka entirely without zookeeper, and it is working > fine so far: single kafka broker, ruby consumers without coordination. > > tim > > > On 2011-11-05, at 22:03 , Mark wrote: > >> Ok, so no matter what ZooKeeper is still required when using Kafka. One just has the option to either loadbalance producer => broker connections via ZooKeeper or a Loadbalancer. >> >> Is that correct? If so, I think I finally got it :) >> >> On 11/5/11 1:29 PM, Jay Kreps wrote: >>> It is also worth mentioning that this is just for producers, consumers >>> always use zookeeper for load balancing and co-ordination. Logically this >>> makes sense--partitioning production is trivial if you don't care about >>> semantics of key=>partition assignment, but partitioning consumption is >>> more complex because you need to divide up the partitions amongst the set >>> of all consumers exactly. >>> >>> -jay >>> >>> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> wrote: >>> >>>> The motivation here is is that literally every production process at >>>> LinkedIn sends messages to Kafka as part of either user tracking or >>>> operational monitoring or both. We are wary of adding that many zk >>>> connections and watches, so we run this first tier through a simple L2 load >>>> balancer that just randomly balances connections over brokers. The good >>>> part about this is that we can do zookeeper upgrades without redeploying >>>> all the production apps to upgrade their zk jar. >>>> >>>> As Neha says, the zk producer is used for key-based partitioning by the >>>> smaller number of producers who need that. >>>> >>>> -Jay >>>> >>>> >>>> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede<[EMAIL PROTECTED]>wrote: >>>> >>>>> Mark, >>>>> >>>>> Most publishers at LinkedIn use a hardware load balancer approach. >>>>> These are configured to do a TCP healthcheck that monitors if the >>>>> kafka port on a broker is working. If it is, then requests are >>>>> forwarded to the broker. Some publishers though are using the software >>>>> load balancer based on zookeeper. Those applications want to do some >>>>> key based partitioning of data. >>>>> >>>>> Thanks, >>>>> Neha >>>>> >>>>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<[EMAIL PROTECTED]> wrote: >>>>>> Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >>>>>> instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >>>>>> >>>>>> Thanks >>>>>> >>>>>> On 11/4/11 7:09 PM, Jun Rao wrote: >>>>>>> broker.list is used in the producer property file. One caveat is that >>>>> the >>>>>>> broker.list approach doesn't do healthcheck. Which means that if a >>>>> broker >>>>>>> goes down, the client could still try to send messages to it. At >>>>> LinkedIn, >>>>>>> we rely on a load balancer to do healthcheck for us. The zk-based >>>>>>> producer, >>>>>>> on the other hand, does health check. >>>>>>> >>>>>>> You can find out more details about our ZK design in our design page in >>>>>>> the >>>>>>> website or the paper in >>>>>>> >>>>>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >>>>> . >>>>>>> Jun >>>>>>> >>>>>>> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> >>>>> wrote: >>>>>>>> I just noticed that there is an option to not use Zookeeper and >>>>> instead >>>>>>>> one can use a static list of brokers (#9 on >>>>>>>> http://incubator.apache.org/** >>>>>>>> >>>>>>>> kafka/quickstart.html< >>>>> http://incubator.apache.org/kafka/quickstart.html>). >>>>>>>> Do i put this list in server.properties? >>>>>>>> >>>>>>>> It doesn't seem like you save much either way as you have to either >>>>>>>> a) list out all the nodes in the zookeeper quorum in
-
Re: ZookeeperTim Lossen 2011-11-07, 08:27
sure, we are not in production yet, so things might still
change, but our current setup is as follows: - no zookeeper - single kafka broker - second kafka broker as standby - logs are rsynced to standy every 5 minutes - topics not (yet) partitioned - multithreaded jruby consumer - each thread with separate kafka client instance cheers tim On 2011-11-06, at 18:05 , Mark wrote: > Tim, > > Would you mind explaining how you use Kafka? Basically the general overview of the messages/events you are capturing and how you go about processing them. We will also be using kafka-rb so I'm particularly interested in how others are using it. > > - M > > On 11/5/11 11:49 PM, Tim Lossen wrote: >> we are using kafka entirely without zookeeper, and it is working >> fine so far: single kafka broker, ruby consumers without coordination. >> >> tim >> >> >> On 2011-11-05, at 22:03 , Mark wrote: >> >>> Ok, so no matter what ZooKeeper is still required when using Kafka. One just has the option to either loadbalance producer => broker connections via ZooKeeper or a Loadbalancer. >>> >>> Is that correct? If so, I think I finally got it :) >>> >>> On 11/5/11 1:29 PM, Jay Kreps wrote: >>>> It is also worth mentioning that this is just for producers, consumers >>>> always use zookeeper for load balancing and co-ordination. Logically this >>>> makes sense--partitioning production is trivial if you don't care about >>>> semantics of key=>partition assignment, but partitioning consumption is >>>> more complex because you need to divide up the partitions amongst the set >>>> of all consumers exactly. >>>> >>>> -jay >>>> >>>> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> wrote: >>>> >>>>> The motivation here is is that literally every production process at >>>>> LinkedIn sends messages to Kafka as part of either user tracking or >>>>> operational monitoring or both. We are wary of adding that many zk >>>>> connections and watches, so we run this first tier through a simple L2 load >>>>> balancer that just randomly balances connections over brokers. The good >>>>> part about this is that we can do zookeeper upgrades without redeploying >>>>> all the production apps to upgrade their zk jar. >>>>> >>>>> As Neha says, the zk producer is used for key-based partitioning by the >>>>> smaller number of producers who need that. >>>>> >>>>> -Jay >>>>> >>>>> >>>>> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede<[EMAIL PROTECTED]>wrote: >>>>> >>>>>> Mark, >>>>>> >>>>>> Most publishers at LinkedIn use a hardware load balancer approach. >>>>>> These are configured to do a TCP healthcheck that monitors if the >>>>>> kafka port on a broker is working. If it is, then requests are >>>>>> forwarded to the broker. Some publishers though are using the software >>>>>> load balancer based on zookeeper. Those applications want to do some >>>>>> key based partitioning of data. >>>>>> >>>>>> Thanks, >>>>>> Neha >>>>>> >>>>>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<[EMAIL PROTECTED]> wrote: >>>>>>> Sorry but I'm a bit confused now. So at LinkedIn you use a loadbalancer >>>>>>> instead of ZooKeeper or do you use it in conjunction with ZooKeeper? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On 11/4/11 7:09 PM, Jun Rao wrote: >>>>>>>> broker.list is used in the producer property file. One caveat is that >>>>>> the >>>>>>>> broker.list approach doesn't do healthcheck. Which means that if a >>>>>> broker >>>>>>>> goes down, the client could still try to send messages to it. At >>>>>> LinkedIn, >>>>>>>> we rely on a load balancer to do healthcheck for us. The zk-based >>>>>>>> producer, >>>>>>>> on the other hand, does health check. >>>>>>>> >>>>>>>> You can find out more details about our ZK design in our design page in >>>>>>>> the >>>>>>>> website or the paper in >>>>>>>> >>>>>>>> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations >>>>>> . >>>>>>>> Jun >>>>>>>> >>>>>>>> On Fri, Nov 4, 2011 at 6:52 PM, Mark<[EMAIL PROTECTED]> http://tim.lossen.de
-
Re: ZookeeperJun Rao 2011-11-07, 16:25
Hi, Tim,
Thanks for sharing this. As part of the replication work (KAFKA-50), partitions will become logical and their physical locations are registered in ZK. This will make it difficult to use Kafka without ZK. Overall, I think that simplifies the client. However, if you have any concerns, please comment in the mailing list or the jira. Jun On Mon, Nov 7, 2011 at 12:27 AM, Tim Lossen <[EMAIL PROTECTED]> wrote: > sure, we are not in production yet, so things might still > change, but our current setup is as follows: > > - no zookeeper > - single kafka broker > - second kafka broker as standby > - logs are rsynced to standy every 5 minutes > - topics not (yet) partitioned > - multithreaded jruby consumer > - each thread with separate kafka client instance > > cheers > tim > > > On 2011-11-06, at 18:05 , Mark wrote: > > > Tim, > > > > Would you mind explaining how you use Kafka? Basically the general > overview of the messages/events you are capturing and how you go about > processing them. We will also be using kafka-rb so I'm particularly > interested in how others are using it. > > > > - M > > > > On 11/5/11 11:49 PM, Tim Lossen wrote: > >> we are using kafka entirely without zookeeper, and it is working > >> fine so far: single kafka broker, ruby consumers without coordination. > >> > >> tim > >> > >> > >> On 2011-11-05, at 22:03 , Mark wrote: > >> > >>> Ok, so no matter what ZooKeeper is still required when using Kafka. > One just has the option to either loadbalance producer => broker > connections via ZooKeeper or a Loadbalancer. > >>> > >>> Is that correct? If so, I think I finally got it :) > >>> > >>> On 11/5/11 1:29 PM, Jay Kreps wrote: > >>>> It is also worth mentioning that this is just for producers, consumers > >>>> always use zookeeper for load balancing and co-ordination. Logically > this > >>>> makes sense--partitioning production is trivial if you don't care > about > >>>> semantics of key=>partition assignment, but partitioning consumption > is > >>>> more complex because you need to divide up the partitions amongst the > set > >>>> of all consumers exactly. > >>>> > >>>> -jay > >>>> > >>>> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> > wrote: > >>>> > >>>>> The motivation here is is that literally every production process at > >>>>> LinkedIn sends messages to Kafka as part of either user tracking or > >>>>> operational monitoring or both. We are wary of adding that many zk > >>>>> connections and watches, so we run this first tier through a simple > L2 load > >>>>> balancer that just randomly balances connections over brokers. The > good > >>>>> part about this is that we can do zookeeper upgrades without > redeploying > >>>>> all the production apps to upgrade their zk jar. > >>>>> > >>>>> As Neha says, the zk producer is used for key-based partitioning by > the > >>>>> smaller number of producers who need that. > >>>>> > >>>>> -Jay > >>>>> > >>>>> > >>>>> On Sat, Nov 5, 2011 at 11:56 AM, Neha Narkhede< > [EMAIL PROTECTED]>wrote: > >>>>> > >>>>>> Mark, > >>>>>> > >>>>>> Most publishers at LinkedIn use a hardware load balancer approach. > >>>>>> These are configured to do a TCP healthcheck that monitors if the > >>>>>> kafka port on a broker is working. If it is, then requests are > >>>>>> forwarded to the broker. Some publishers though are using the > software > >>>>>> load balancer based on zookeeper. Those applications want to do some > >>>>>> key based partitioning of data. > >>>>>> > >>>>>> Thanks, > >>>>>> Neha > >>>>>> > >>>>>> On Sat, Nov 5, 2011 at 11:49 AM, Mark<[EMAIL PROTECTED]> > wrote: > >>>>>>> Sorry but I'm a bit confused now. So at LinkedIn you use a > loadbalancer > >>>>>>> instead of ZooKeeper or do you use it in conjunction with > ZooKeeper? > >>>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> On 11/4/11 7:09 PM, Jun Rao wrote: > >>>>>>>> broker.list is used in the producer property file. One caveat is > that > >>>>>> the > >>>>>>>> broker.list approach doesn't do healthcheck. Which means that if a
-
Re: ZookeeperTaylor Gautier 2011-11-07, 16:30
Right now, we do not use ZK either - we have both producer and consumer
side sharding that takes care of sending messages to topics on the right kafka instance. Each kafka instance we deploy is a complete silo and has no knowledge of other kafka instances. Since our initial use case is somewhat outside the envelope of what Kafka was built for, we felt this was necessary - basically we have a very large # of topics with low throughput, while the primary use case for Kafka as I understand it is a low # of topics with high throughput. Eventually we will use Kafka for both kinds of use cases. I have probably mentioned it before on the list, but one thing we haven't had a chance to look into carefully is whether the user provided partitioning scheme can implement what we are trying to do, basically send messages to a given broker based on topics so the topics are spread across the cluster, allowing us to increase the total # of topics we can support. On Mon, Nov 7, 2011 at 8:25 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Hi, Tim, > > Thanks for sharing this. As part of the replication work (KAFKA-50), > partitions will become logical and their physical locations are registered > in ZK. This will make it difficult to use Kafka without ZK. Overall, I > think that simplifies the client. However, if you have any concerns, please > comment in the mailing list or the jira. > > Jun > > On Mon, Nov 7, 2011 at 12:27 AM, Tim Lossen <[EMAIL PROTECTED]> wrote: > > > sure, we are not in production yet, so things might still > > change, but our current setup is as follows: > > > > - no zookeeper > > - single kafka broker > > - second kafka broker as standby > > - logs are rsynced to standy every 5 minutes > > - topics not (yet) partitioned > > - multithreaded jruby consumer > > - each thread with separate kafka client instance > > > > cheers > > tim > > > > > > On 2011-11-06, at 18:05 , Mark wrote: > > > > > Tim, > > > > > > Would you mind explaining how you use Kafka? Basically the general > > overview of the messages/events you are capturing and how you go about > > processing them. We will also be using kafka-rb so I'm particularly > > interested in how others are using it. > > > > > > - M > > > > > > On 11/5/11 11:49 PM, Tim Lossen wrote: > > >> we are using kafka entirely without zookeeper, and it is working > > >> fine so far: single kafka broker, ruby consumers without coordination. > > >> > > >> tim > > >> > > >> > > >> On 2011-11-05, at 22:03 , Mark wrote: > > >> > > >>> Ok, so no matter what ZooKeeper is still required when using Kafka. > > One just has the option to either loadbalance producer => broker > > connections via ZooKeeper or a Loadbalancer. > > >>> > > >>> Is that correct? If so, I think I finally got it :) > > >>> > > >>> On 11/5/11 1:29 PM, Jay Kreps wrote: > > >>>> It is also worth mentioning that this is just for producers, > consumers > > >>>> always use zookeeper for load balancing and co-ordination. Logically > > this > > >>>> makes sense--partitioning production is trivial if you don't care > > about > > >>>> semantics of key=>partition assignment, but partitioning consumption > > is > > >>>> more complex because you need to divide up the partitions amongst > the > > set > > >>>> of all consumers exactly. > > >>>> > > >>>> -jay > > >>>> > > >>>> On Sat, Nov 5, 2011 at 1:19 PM, Jay Kreps<[EMAIL PROTECTED]> > > wrote: > > >>>> > > >>>>> The motivation here is is that literally every production process > at > > >>>>> LinkedIn sends messages to Kafka as part of either user tracking or > > >>>>> operational monitoring or both. We are wary of adding that many zk > > >>>>> connections and watches, so we run this first tier through a simple > > L2 load > > >>>>> balancer that just randomly balances connections over brokers. The > > good > > >>>>> part about this is that we can do zookeeper upgrades without > > redeploying > > >>>>> all the production apps to upgrade their zk jar. > > >>>>> > > >>>>> As Neha says, the zk producer is used for key-based partitioning by
-
Re: ZookeeperChris Burroughs 2011-11-10, 21:17
On 11/07/2011 11:25 AM, Jun Rao wrote:
> Thanks for sharing this. As part of the replication work (KAFKA-50), > partitions will become logical and their physical locations are registered > in ZK. This will make it difficult to use Kafka without ZK. Do you anticipate replication requiring changes to SyncProducer or SimpleConsumer? I think it's worthwhile to maintain the ability to use kafka as a simple local daemon with no knowledge of the outside world.
-
Re: ZookeeperChris Burroughs 2011-11-15, 01:12
+list for reference. I'll review the replication doc again.
On 11/10/2011 08:39 PM, Jun Rao wrote: > Both SyncProducer and SimpleConsumer will still be there. However, with > replication, only the broker hosting the master of a partition can handle > the read/write request. So if you send a request to a wrong host, your > request will be rejected. Our high level producer/consumer api will be able > to figure out the correct host from ZK. > > Jun > > On Thu, Nov 10, 2011 at 1:17 PM, Chris Burroughs > <[EMAIL PROTECTED]>wrote: > >> On 11/07/2011 11:25 AM, Jun Rao wrote: >>> Thanks for sharing this. As part of the replication work (KAFKA-50), >>> partitions will become logical and their physical locations are >> registered >>> in ZK. This will make it difficult to use Kafka without ZK. >> >> Do you anticipate replication requiring changes to SyncProducer or >> SimpleConsumer? I think it's worthwhile to maintain the ability to use >> kafka as a simple local daemon with no knowledge of the outside world. >> > |