Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka, mail # dev - produce request wire format question


+
Dave Peterson 2013-05-21, 16:55
+
Neha Narkhede 2013-05-21, 17:47
+
Dave Peterson 2013-05-21, 18:48
+
Colin Blower 2013-05-21, 19:05
+
Neha Narkhede 2013-05-21, 19:31
+
Dave Peterson 2013-05-21, 20:13
+
Jun Rao 2013-05-22, 16:28
+
Dave Peterson 2013-05-22, 20:33
Copy link to this message
-
Re: produce request wire format question
Neha Narkhede 2013-05-22, 23:30
1. Correct
2. The producer does not use or depend on zookeeper anymore. It refreshes
its view of the cluster metadata by using a TopicMetadataRequest to any of
the kafka brokers. It maps a message to a partition using the following
rules -
2.1 If a message has no key, use any available partition
2.2 If a message has a key and the user has defined a custom partitioner,
use it to map the key to a partition id
2.3 If a message has a key and the user has not defined a custom
partitioner, use the default hash based partitioner that ships with Kafka

Thanks,
Neha
On Wed, May 22, 2013 at 1:33 PM, Dave Peterson <[EMAIL PROTECTED]>wrote:

> Ok, the picture I have in my mind of how things work in 0.8 (from a
> producer's point of view) is as follows:
>
>     1.  An application program sends log messages to a producer.  Each
>         message is provided as a key/value pair, where the key is chosen
>         by the application and the value is the message contents.  By its
>         choice of key, the application may influence or control which
>         partition the message gets sent to.
>
>     2.  The producer receives messages as key/value pairs.  From talking
>         with zookeeper, it knows the set of available brokers and which
>         partitions each broker has.  If the sending application provided a
> key
>         for a given message, the contents of the key may optionally
>         influence the producer's choice of broker and partition to send the
>         message to, according to some convention understood by both
>         application program and producer.
>
> Is this correct?
>
> Thanks,
> Dave
>
> On Wed, May 22, 2013 at 9:28 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > Dave,
> >
> > Currently, the broker expects each producer request to specify the exact
> > partition id (-1 is on longer valid). The mapping from a message to a
> > partition is done at the producer client. The producer can choose a
> random
> > partition (from the existing list of partitions) or deterministically
> > choose a partition based on the key.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, May 21, 2013 at 1:12 PM, Dave Peterson <[EMAIL PROTECTED]
> >wrote:
> >
> >> In my case, there is a load balancer between the producers and the
> >> brokers, so I want the behavior described for the Java client (null key
> >> specifies "any partition").  If the Key field of each individual message
> >> specifies the partition to send it to, then I don't understand the
> purpose
> >> of the 32-bit partition identifier that precedes each message set in a
> >> produce request: what if a produce request specifies "partition N" for a
> >> given message set, and then each individual message in the set
> >> specifies a different partition in its Key field?  Also, the above-
> >> mentioned partition identifier is a 32-bit integer and the Key field of
> >> each individual message can contain data of arbitrary length, which
> >> seems inconsistent.  Is a partition identifier a 32-bit integer, or can
> it
> >> be of arbitrary length?
> >>
> >> Thanks,
> >> Dave
> >>
> >> On Tue, May 21, 2013 at 12:30 PM, Neha Narkhede <
> [EMAIL PROTECTED]>
> >> wrote:
> >> > Dave,
> >> >
> >> > Colin described the producer behavior of picking the partition for a
> >> > message before it is sent to Kafka broker correctly. However, I'm
> >> > interested in knowing your use case a little before to see why you
> would
> >> > rather have the broker decide the partition?
> >> >
> >> > Thanks,
> >> > Neha
> >> >
> >> >
> >> > On Tue, May 21, 2013 at 12:05 PM, Colin Blower <[EMAIL PROTECTED]
> >> >wrote:
> >> >
> >> >> The key is used by the client to decide which partition to send the
> >> >> message to. By the time the client is creating the produce request,
> it
> >> >> should be known which partition each message is being sent to. I
> believe
> >> >> Neha described the behavior of the Java client which sends messages
> with
> >> >> a null key to any partition.
> >> >>

 
+
Dave Peterson 2013-05-23, 16:43
+
Neha Narkhede 2013-05-23, 16:56
+
Colin Blower 2013-05-23, 16:57