Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - who is using kafka to stare large messages?


Copy link to this message
-
Re: who is using kafka to stare large messages?
Neha Narkhede 2013-10-08, 16:37
And to be clear, if uncompressed messages come in, they remain uncompressed
in the broker, correct?

Correct

Currently, only the broker has knowledge of the offsets for a partition and
hence is the right place to assign the offsets. Even if the producer sends
metadata, the broker still needs to decompress the data in order to get a
handle to the individual message in order to assign the logical offset.

One of the JIRAs discussing this is here -
https://issues.apache.org/jira/browse/KAFKA-595

Thanks,
Neha

On Tue, Oct 8, 2013 at 9:24 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Ah,
>
> I think I remember a previous discussion on a way to avoid the double
> compression....
>
> So would it be possible for the producer to send metadata with a compressed
> batch that includes the logical offset info for the batch?  Can this info
> just be a count of how many messages are in the batch?
>
> And to be clear, if uncompressed messages come in, they remain uncompressed
> in the broker, correct?
>
> Jason
>
>
> On Tue, Oct 8, 2013 at 10:20 AM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > The broker only recompresses the messages if the producer sent them
> > compressed. And it has to recompress to assign the logical offsets to the
> > individual messages inside the compressed message.
> >
> > Thanks,
> > Neha
> > On Oct 7, 2013 11:36 PM, "Jason Rosenberg" <[EMAIL PROTECTED]> wrote:
> >
> > > Neha,
> > >
> > > Does the broker store messages compressed, even if the producer doesn't
> > > compress them when sending them to the broker?
> > >
> > > Why does the broker re-compress message batches?  Does it not have
> enough
> > > info from the producer request to know the number of messages in the
> > batch?
> > >
> > > Jason
> > >
> > >
> > > On Mon, Oct 7, 2013 at 12:40 PM, Neha Narkhede <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > the total message size of the batch should be less than
> > > > message.max.bytes or is that for each individual message?
> > > >
> > > > The former is correct.
> > > >
> > > > When you batch, I am assuming that the producer sends some sort of
> flag
> > > > that this is a batch, and then the broker will split up those
> messages
> > to
> > > > individual messages and store them in the log correct?
> > > >
> > > > The broker splits the compressed message into individual messages to
> > > assign
> > > > the logical offsets to every message, but the data is finally stored
> > > > compressed and is delivered in the compressed format to the consumer.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Mon, Oct 7, 2013 at 9:26 AM, S Ahmed <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > When you batch things on the producer, say you batch 1000 messages
> or
> > > by
> > > > > time whatever, the total message size of the batch should be less
> > than
> > > > > message.max.bytes or is that for each individual message?
> > > > >
> > > > > When you batch, I am assuming that the producer sends some sort of
> > flag
> > > > > that this is a batch, and then the broker will split up those
> > messages
> > > to
> > > > > individual messages and store them in the log correct?
> > > > >
> > > > >
> > > > > On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > The message size limit is imposed on the compressed message. To
> > > answer
> > > > > your
> > > > > > question about the effect of large messages - they cause memory
> > > > pressure
> > > > > on
> > > > > > the Kafka brokers as well as on the consumer since we re-compress
> > > > > messages
> > > > > > on the broker and decompress messages on the consumer.
> > > > > >
> > > > > > I'm not so sure that large messages will have a hit on latency
> > since
> > > > > > compressing a few large messages vs compressing lots of small
> > > messages
> > > > > with
> > > > > > the same content, should not be any slower. But you want to be
> > > careful
> > > > on
> > > > > > the batch size since you don't want the compressed message to