Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> New Producer Public API


Copy link to this message
-
Re: New Producer Public API
Currently, the user will send ProducerRecords using the new producer. The
expectation will be that you get the same thing as output from the
consumer. Since ProduceRecord is a holder for topic, partition, key and
value, does it make sense to rename it to just Record? So, the send/receive
APIs would look like the following -

producer.send(Record record);
List<Record> poll();

Thoughts?
On Sun, Feb 2, 2014 at 4:12 PM, Guozhang Wang <[EMAIL PROTECTED]> wrote:

> I think the most common motivate of having a customized partitioner is to
> make sure some messages always go to the same partition, but people may
> seldom want to know about which partition exactly they go to. If that is
> true, why not just assign the same byte array as partition key with the
> default hash based partitioning in option 1.A? But again, that is based on
> my presumption that very few users would want to really specify the
> partition id.
>
>
>
> On Fri, Jan 31, 2014 at 2:44 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > Hey Tom,
> >
> > Agreed, there is definitely nothing that prevents our including
> partitioner
> > implementations, but it does get a little less seamless.
> >
> > -Jay
> >
> >
> > On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown <[EMAIL PROTECTED]> wrote:
> >
> > > Regarding partitioning APIs, I don't think there is not a common subset
> > of
> > > information that is required for all strategies. Instead of modifying
> the
> > > core API to easily support all of the various partitioning strategies,
> > > offer the most common ones as libraries they can build into their own
> > data
> > > pipeline, just like serialization. The core API would simply accept a
> > > partition index. You could include one default strategy (random) that
> > only
> > > applies if they set "-1" for the partition index.
> > >
> > > That way, each partitioning strategy could have its own API that makes
> > > sense for it. For example, a round-robin partitioner only needs one
> > method:
> > > "nextPartition()", while a hash-based one needs
> > "getPartitionFor(byte[])".
> > >
> > > For those who actually need a pluggable strategy, a superset of the API
> > > could be codified into an interface (perhaps the existing partitioner
> > > interface), but it would still have to be used from outside of the core
> > > API.
> > >
> > > This design would make the core API less confusing (when do I use a
> > > partiton key instead of a partition index, does the key overwrite the
> > > index, can the key be null, etc...?) while still providing the
> > flexibility
> > > you want.
> > >
> > > --Tom
> > >
> > > On Fri, Jan 31, 2014 at 12:07 PM, Jay Kreps <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Oliver,
> > > >
> > > > Yeah that was my original plan--allow the registration of multiple
> > > > callbacks on the future. But there is some additional implementation
> > > > complexity because then you need more synchronization variables to
> > ensure
> > > > the callback gets executed even if the request has completed at the
> > time
> > > > the callback is registered. This also makes it unpredictable the
> order
> > of
> > > > callback execution--I want to be able to guarantee that for a
> > particular
> > > > partition callbacks for lower offset messages happen before callbacks
> > for
> > > > higher offset messages so that if you set a highwater mark or
> something
> > > it
> > > > is easy to reason about. This has the added benefit that callbacks
> > > execute
> > > > in the I/O thread ALWAYS instead of it being non-deterministic which
> > is a
> > > > little confusing.
> > > >
> > > > I thought a single callback is sufficient since you can always
> include
> > > > multiple actions in that callback, and I think that case is rare
> > anyway.
> > > >
> > > > I did think about the possibility of adding a thread pool for
> handling
> > > the
> > > > callbacks. But there are a lot of possible configurations for such a
> > > thread
> > > > pool and a simplistic approach would no longer guarantee in-order