Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> who is using kafka to stare large messages?


Copy link to this message
-
Re: who is using kafka to stare large messages?
Ah,

I think I remember a previous discussion on a way to avoid the double
compression....

So would it be possible for the producer to send metadata with a compressed
batch that includes the logical offset info for the batch?  Can this info
just be a count of how many messages are in the batch?

And to be clear, if uncompressed messages come in, they remain uncompressed
in the broker, correct?

Jason
On Tue, Oct 8, 2013 at 10:20 AM, Neha Narkhede <[EMAIL PROTECTED]>wrote:

> The broker only recompresses the messages if the producer sent them
> compressed. And it has to recompress to assign the logical offsets to the
> individual messages inside the compressed message.
>
> Thanks,
> Neha
> On Oct 7, 2013 11:36 PM, "Jason Rosenberg" <[EMAIL PROTECTED]> wrote:
>
> > Neha,
> >
> > Does the broker store messages compressed, even if the producer doesn't
> > compress them when sending them to the broker?
> >
> > Why does the broker re-compress message batches?  Does it not have enough
> > info from the producer request to know the number of messages in the
> batch?
> >
> > Jason
> >
> >
> > On Mon, Oct 7, 2013 at 12:40 PM, Neha Narkhede <[EMAIL PROTECTED]
> > >wrote:
> >
> > > the total message size of the batch should be less than
> > > message.max.bytes or is that for each individual message?
> > >
> > > The former is correct.
> > >
> > > When you batch, I am assuming that the producer sends some sort of flag
> > > that this is a batch, and then the broker will split up those messages
> to
> > > individual messages and store them in the log correct?
> > >
> > > The broker splits the compressed message into individual messages to
> > assign
> > > the logical offsets to every message, but the data is finally stored
> > > compressed and is delivered in the compressed format to the consumer.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Mon, Oct 7, 2013 at 9:26 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> > >
> > > > When you batch things on the producer, say you batch 1000 messages or
> > by
> > > > time whatever, the total message size of the batch should be less
> than
> > > > message.max.bytes or is that for each individual message?
> > > >
> > > > When you batch, I am assuming that the producer sends some sort of
> flag
> > > > that this is a batch, and then the broker will split up those
> messages
> > to
> > > > individual messages and store them in the log correct?
> > > >
> > > >
> > > > On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > The message size limit is imposed on the compressed message. To
> > answer
> > > > your
> > > > > question about the effect of large messages - they cause memory
> > > pressure
> > > > on
> > > > > the Kafka brokers as well as on the consumer since we re-compress
> > > > messages
> > > > > on the broker and decompress messages on the consumer.
> > > > >
> > > > > I'm not so sure that large messages will have a hit on latency
> since
> > > > > compressing a few large messages vs compressing lots of small
> > messages
> > > > with
> > > > > the same content, should not be any slower. But you want to be
> > careful
> > > on
> > > > > the batch size since you don't want the compressed message to
> exceed
> > > the
> > > > > message size limit.
> > > > >
> > > > > Thanks,
> > > > > Neha
> > > > >
> > > > >
> > > > > On Mon, Oct 7, 2013 at 9:10 AM, S Ahmed <[EMAIL PROTECTED]>
> > wrote:
> > > > >
> > > > > > I see, so that is one thing to consider is if I have 20 KB
> > messages,
> > > I
> > > > > > shouldn't batch too many together as that will increase latency
> and
> > > the
> > > > > > memory usage footprint on the producer side of things.
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 7, 2013 at 11:55 AM, Jun Rao <[EMAIL PROTECTED]>
> wrote:
> > > > > >
> > > > > > > At LinkedIn, our message size can be 10s of KB. This is mostly
> > > > because
> > > > > we
> > > > > > > batch a set of messages and send them as a single compressed