Founder, Principal Consultant
Big Data Open Source Security LLC
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
On Wed, Aug 7, 2013 at 2:41 AM, masoom alam <[EMAIL PROTECTED]> wrote:
> Hi Joe,
> Many thanks for such a detailed response.
> So you would have a topic called "TypeA" and then setup a consumer group
> > and those consumers (if you really only needed 1 consumer set the
> > partitions to 1) would get everything from the "TypeA" topic. If you had
> > more event types then just setup more topics and then consumers group
> for a
> > consumer for those topics.
> Got it, so we will have multiple topics. Each topic will be logically
> associated with a set of consumers (Consumer Group).
> I was thinking that
> this approach will be better off, in terms of efficiency by having each
> individual topic associated with each consumer or consumer group.
> > Now depending on what you need to-do you may need topics to not be by
> > perhaps be another value and in that case you can still "pin" data to a
> > consumer in that case use Semantic Partitioning
> > http://kafka.apache.org/design.html
> > *
> > *Semantic partitioning*
> > "Consider an application that would like to maintain an aggregation of
> > number of profile visitors for each member. It would like to send all
> > profile visit events for a member to a particular partition and, hence,
> > have all updates for a member to appear in the same stream for the same
> > consumer thread. The producer has the capability to be able to
> > map messages to the available kafka nodes and partitions. This allows
> > partitioning the stream of messages with some semantic partition function
> > based on some key in the message to spread them over broker machines. The
> > partitioning function can be customized by providing an implementation of
> > the kafka.producer.Partitioner interface, default being the random
> > partitioner. For the example above, the key would be member_id and the
> > partitioning function would be hash(member_id)%num_partitions."
> > If I am getting you correctly, the responsibility of mapping of events
> will be on the shoulders of Producers right?.
Well, not exactly it would be under the hood the event name just gets
passed into the key the actual mapping action of which partition it goes to
is handled by Kafka
> What if, we want to have a
> function at the Kafka brokers nodes which actually performs the mapping. I
> mean from the Producer side, if we want to make it transparent
It is transparent the only change on the producer side is instead of
KeyedMessage("topicname",message) you do
KeyedMessage("topicname",message,"eventName") and then have the configs
> which event
> will go to which consumer. Actually, in our scenario, we will have
> producers at the client end, and brokers and consumers at our end.
The client end has to-do some level of effort to integrate so its not much
more than an additional config set and one more param in the constructor
> Will this be a feasible approach?
Well your client would have to share the same ZooKeeper (assuming 0.8)
instances also this is not going to be fantastic over a WAN assuming your
mention/meaning of "client" is another infrastructure?
If my thought of you having another infrastructure posting to you my
suggestion would be to front this with a REST based service and have them
post you some JSON or something and then within your interface public
facing tier then send/produce that into Kafka.
> I am also thinking if we can include some
> sort of load balancing at the Kafka broker nodes?
This is handled for you by 0.8 now under the hood
> That is depending on the
> load of the consumers, the brokers writes the events to the respective
> topics set for each consumer.