Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - New Producer Public API


Copy link to this message
-
Re: New Producer Public API
Jay Kreps 2014-02-06, 21:36
This is pretty hard to do with the architecture we've gone with as the
stored events are not objects, but tightly packed serialized bytes. This
approach is much better from a performance and memory management point of
view, though, so I'd be very hesitant to change it. So it is pretty hard to
provide a usable api. I think likely this is something that would be better
implemented in the application (e.g. use a blocking queue and batch into a
single message after the timeout).

-Jay
On Thu, Feb 6, 2014 at 1:16 PM, S Ahmed <[EMAIL PROTECTED]> wrote:

> How about the following use case:
>
> Just before the producer actually sends the payload to kakfa, could an
> event be exposed that would allow one to loop through the messages and
> potentially delete some of them?
>
> Example:
>
> Say you have 100 messages, but before you send these messages to kakfa, you
> can easily aggregate many of these messages to reduce the message count.
>  If there are messages that store counts, you could aggregate these into a
> single message and then send to kafka.
>
> Thoughts?
>
>
>
> On Wed, Feb 5, 2014 at 2:03 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
>
> > It might. I considered this but ended up going this way. Now that we have
> > changed partitionKey=>partition it almost works. The difference is the
> > consumer gets an offset too which the producer doesn't have.
> >
> > One thing I think this points to is the value of getting the consumer
> java
> > api worked out even in the absence of an implementation just so we can
> > write some fake code that uses both and kind of see how it feels.
> >
> > -Jay
> >
> >
> > On Wed, Feb 5, 2014 at 10:23 AM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Currently, the user will send ProducerRecords using the new producer.
> The
> > > expectation will be that you get the same thing as output from the
> > > consumer. Since ProduceRecord is a holder for topic, partition, key and
> > > value, does it make sense to rename it to just Record? So, the
> > send/receive
> > > APIs would look like the following -
> > >
> > > producer.send(Record record);
> > > List<Record> poll();
> > >
> > > Thoughts?
> > >
> > >
> > > On Sun, Feb 2, 2014 at 4:12 PM, Guozhang Wang <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > I think the most common motivate of having a customized partitioner
> is
> > to
> > > > make sure some messages always go to the same partition, but people
> may
> > > > seldom want to know about which partition exactly they go to. If that
> > is
> > > > true, why not just assign the same byte array as partition key with
> the
> > > > default hash based partitioning in option 1.A? But again, that is
> based
> > > on
> > > > my presumption that very few users would want to really specify the
> > > > partition id.
> > > >
> > > >
> > > >
> > > > On Fri, Jan 31, 2014 at 2:44 PM, Jay Kreps <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > Hey Tom,
> > > > >
> > > > > Agreed, there is definitely nothing that prevents our including
> > > > partitioner
> > > > > implementations, but it does get a little less seamless.
> > > > >
> > > > > -Jay
> > > > >
> > > > >
> > > > > On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown <[EMAIL PROTECTED]>
> > > wrote:
> > > > >
> > > > > > Regarding partitioning APIs, I don't think there is not a common
> > > subset
> > > > > of
> > > > > > information that is required for all strategies. Instead of
> > modifying
> > > > the
> > > > > > core API to easily support all of the various partitioning
> > > strategies,
> > > > > > offer the most common ones as libraries they can build into their
> > own
> > > > > data
> > > > > > pipeline, just like serialization. The core API would simply
> > accept a
> > > > > > partition index. You could include one default strategy (random)
> > that
> > > > > only
> > > > > > applies if they set "-1" for the partition index.
> > > > > >
> > > > > > That way, each partitioning strategy could have its own API that
> > > makes
> > > >