Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - New Producer Public API


Copy link to this message
-
Re: New Producer Public API
S Ahmed 2014-02-06, 21:20
How about the following use case:

Just before the producer actually sends the payload to kakfa, could an
event be exposed that would allow one to loop through the messages and
potentially delete some of them?

Example:

Say you have 100 messages, but before you send these messages to kakfa, you
can easily aggregate many of these messages to reduce the message count.
 If there are messages that store counts, you could aggregate these into a
single message and then send to kafka.

Thoughts?

On Wed, Feb 5, 2014 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:

> It might. I considered this but ended up going this way. Now that we have
> changed partitionKey=>partition it almost works. The difference is the
> consumer gets an offset too which the producer doesn't have.
>
> One thing I think this points to is the value of getting the consumer java
> api worked out even in the absence of an implementation just so we can
> write some fake code that uses both and kind of see how it feels.
>
> -Jay
>
>
> On Wed, Feb 5, 2014 at 10:23 AM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > Currently, the user will send ProducerRecords using the new producer. The
> > expectation will be that you get the same thing as output from the
> > consumer. Since ProduceRecord is a holder for topic, partition, key and
> > value, does it make sense to rename it to just Record? So, the
> send/receive
> > APIs would look like the following -
> >
> > producer.send(Record record);
> > List<Record> poll();
> >
> > Thoughts?
> >
> >
> > On Sun, Feb 2, 2014 at 4:12 PM, Guozhang Wang <[EMAIL PROTECTED]>
> wrote:
> >
> > > I think the most common motivate of having a customized partitioner is
> to
> > > make sure some messages always go to the same partition, but people may
> > > seldom want to know about which partition exactly they go to. If that
> is
> > > true, why not just assign the same byte array as partition key with the
> > > default hash based partitioning in option 1.A? But again, that is based
> > on
> > > my presumption that very few users would want to really specify the
> > > partition id.
> > >
> > >
> > >
> > > On Fri, Jan 31, 2014 at 2:44 PM, Jay Kreps <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Hey Tom,
> > > >
> > > > Agreed, there is definitely nothing that prevents our including
> > > partitioner
> > > > implementations, but it does get a little less seamless.
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Regarding partitioning APIs, I don't think there is not a common
> > subset
> > > > of
> > > > > information that is required for all strategies. Instead of
> modifying
> > > the
> > > > > core API to easily support all of the various partitioning
> > strategies,
> > > > > offer the most common ones as libraries they can build into their
> own
> > > > data
> > > > > pipeline, just like serialization. The core API would simply
> accept a
> > > > > partition index. You could include one default strategy (random)
> that
> > > > only
> > > > > applies if they set "-1" for the partition index.
> > > > >
> > > > > That way, each partitioning strategy could have its own API that
> > makes
> > > > > sense for it. For example, a round-robin partitioner only needs one
> > > > method:
> > > > > "nextPartition()", while a hash-based one needs
> > > > "getPartitionFor(byte[])".
> > > > >
> > > > > For those who actually need a pluggable strategy, a superset of the
> > API
> > > > > could be codified into an interface (perhaps the existing
> partitioner
> > > > > interface), but it would still have to be used from outside of the
> > core
> > > > > API.
> > > > >
> > > > > This design would make the core API less confusing (when do I use a
> > > > > partiton key instead of a partition index, does the key overwrite
> the
> > > > > index, can the key be null, etc...?) while still providing the
> > > > flexibility
> > > > > you want.
> > > > >
> > >