Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Consumer re-design and Python

Copy link to this message
Re: Consumer re-design and Python
On 1/31/13 3:30 PM, Marc Labbe wrote:
> Hi,
> I am fairly new to Kafka and Scala, I am trying to see through the consumer
> re-design changes, proposed and implemented for 0.8 and after, which will
> affect other languages implementations. There are documentation pages on
> the wiki, JIRA issues but I still can't figure out what's already there for
> 0.8, what will be there in the future and how it affects the consumers
> written in other languages (Python in my case).
> For instance, I am looking at
> https://cwiki.apache.org/KAFKA/consumer-client-re-design.html and the very
> well documented
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Detailed+Consumer+Coordinator+Design
> and
> I am not sure what part is in the works, done and still a proposal. I feel
> there are changes there already in 0.8 but not completely, referring
> especially to KAFKA-364 and KAFKA-264.



are the current design docs (as far as I know).
> Is this all accurate and up to date? There are talks of a coordinator as
> well but from what I see, this hasn't been implemented so far.
 From my understanding, the client redesign has not been finalized and
it still in-progress/todo.
> After all, maybe my question is: other than the wire protocol changes, what
> changes should I expect to do to SimpleConsumer client written in Python
> for v0.8? What should I do next to implement a high level consumer
> (ZookeeperConsumerConnector?) which fits with the design proposal?
With 0.8, you will not need to connect to ZooKeeper from the clients.
With KAFKA-657, offsets are centrally managed by the broker. Any broker
can handle these requests.
> Has anyone started making changes to their implementation yet (thinking
> Brod or Samsa)? I'll post that question on github too.
I am working updating my Python client:
https://github.com/mumrah/kafka-python, still a ways to go yet. The
biggest change (besides centralized offset management) is that each
topic+partition is owned by a specific broker (the leader). When
producing messages, you must send them to the correct leader. This
requires that clients maintain some state of what belongs where which is
a pain, but such is the cost of replication.
> Thanks and cheers!
> marc