Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Kafka >> mail # user >> who is using kafka to stare large messages?


+
S Ahmed 2013-10-07, 14:45
+
Jun Rao 2013-10-07, 15:55
+
S Ahmed 2013-10-07, 16:10
+
Neha Narkhede 2013-10-07, 16:22
+
S Ahmed 2013-10-07, 16:27
+
Benjamin Black 2013-10-07, 16:37
+
Neha Narkhede 2013-10-07, 16:41
+
Jason Rosenberg 2013-10-08, 06:36
+
Neha Narkhede 2013-10-08, 14:20
+
Jason Rosenberg 2013-10-08, 16:25
Copy link to this message
-
Re: who is using kafka to stare large messages?
And to be clear, if uncompressed messages come in, they remain uncompressed
in the broker, correct?

Correct

Currently, only the broker has knowledge of the offsets for a partition and
hence is the right place to assign the offsets. Even if the producer sends
metadata, the broker still needs to decompress the data in order to get a
handle to the individual message in order to assign the logical offset.

One of the JIRAs discussing this is here -
https://issues.apache.org/jira/browse/KAFKA-595

Thanks,
Neha

On Tue, Oct 8, 2013 at 9:24 AM, Jason Rosenberg <[EMAIL PROTECTED]> wrote:

> Ah,
>
> I think I remember a previous discussion on a way to avoid the double
> compression....
>
> So would it be possible for the producer to send metadata with a compressed
> batch that includes the logical offset info for the batch?  Can this info
> just be a count of how many messages are in the batch?
>
> And to be clear, if uncompressed messages come in, they remain uncompressed
> in the broker, correct?
>
> Jason
>
>
> On Tue, Oct 8, 2013 at 10:20 AM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > The broker only recompresses the messages if the producer sent them
> > compressed. And it has to recompress to assign the logical offsets to the
> > individual messages inside the compressed message.
> >
> > Thanks,
> > Neha
> > On Oct 7, 2013 11:36 PM, "Jason Rosenberg" <[EMAIL PROTECTED]> wrote:
> >
> > > Neha,
> > >
> > > Does the broker store messages compressed, even if the producer doesn't
> > > compress them when sending them to the broker?
> > >
> > > Why does the broker re-compress message batches?  Does it not have
> enough
> > > info from the producer request to know the number of messages in the
> > batch?
> > >
> > > Jason
> > >
> > >
> > > On Mon, Oct 7, 2013 at 12:40 PM, Neha Narkhede <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > the total message size of the batch should be less than
> > > > message.max.bytes or is that for each individual message?
> > > >
> > > > The former is correct.
> > > >
> > > > When you batch, I am assuming that the producer sends some sort of
> flag
> > > > that this is a batch, and then the broker will split up those
> messages
> > to
> > > > individual messages and store them in the log correct?
> > > >
> > > > The broker splits the compressed message into individual messages to
> > > assign
> > > > the logical offsets to every message, but the data is finally stored
> > > > compressed and is delivered in the compressed format to the consumer.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Mon, Oct 7, 2013 at 9:26 AM, S Ahmed <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > When you batch things on the producer, say you batch 1000 messages
> or
> > > by
> > > > > time whatever, the total message size of the batch should be less
> > than
> > > > > message.max.bytes or is that for each individual message?
> > > > >
> > > > > When you batch, I am assuming that the producer sends some sort of
> > flag
> > > > > that this is a batch, and then the broker will split up those
> > messages
> > > to
> > > > > individual messages and store them in the log correct?
> > > > >
> > > > >
> > > > > On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > The message size limit is imposed on the compressed message. To
> > > answer
> > > > > your
> > > > > > question about the effect of large messages - they cause memory
> > > > pressure
> > > > > on
> > > > > > the Kafka brokers as well as on the consumer since we re-compress
> > > > > messages
> > > > > > on the broker and decompress messages on the consumer.
> > > > > >
> > > > > > I'm not so sure that large messages will have a hit on latency
> > since
> > > > > > compressing a few large messages vs compressing lots of small
> > > messages
> > > > > with
> > > > > > the same content, should not be any slower. But you want to be
> > > careful
> > > > on
> > > > > > the batch size since you don't want the compressed message to

 
+
MB JA 2014-05-17, 03:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB