Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> 7.1 support for List<ProducerData>


Copy link to this message
-
Re: 7.1 support for List<ProducerData>
As for scalability being a fundamental aspect of Kafka's design and
implementation, besides the design doc, I guess this would be another
primary reference...

http://www.youtube.com/watch?v=Eq3i2m8aJBI

It's a pretty interesting video that touches on many aspects of Kafka, not
just scalability :)

--
Felix

On Tue, Aug 21, 2012 at 10:41 AM, Felix GV <[EMAIL PROTECTED]> wrote:

> What I meant is that Kafka has been designed first and foremost as a
> high-throughput system, and it is achieving that with a couple techniques,
> but mainly by batching a bunch of events together so that it can benefit
> from the lesser overhead of writing sequentially (as opposed to random
> access).
>
> Whether you choose to publish synchronously or asynchronously should not
> change anything to the fact that Kafka can achieve a high throughput via
> batching.
>
> --
> Felix
>
>
>
>
> On Mon, Aug 20, 2012 at 10:55 PM, wm <[EMAIL PROTECTED]> wrote:
>
>> Felix. My regets for confusing the matter.  Please inform me of a primary
>> source for the canonical use case you reference, unless that was scoped to
>> the kafka community only. That sort of statement should be clearly
>> documented imho.
>>
>> I am considering the matter closed with respect to this list. I have 3
>> publish options each with some degree of autonomy from the calling code's
>> designed behavior.
>>
>> regards
>>
>>
>> On 08/20/2012 02:39 PM, Felix GV wrote:
>>
>>> I think the difference is merely that async publishing is a non-blocking
>>> call, whereas sync publishing is a blocking call, meaning that the code
>>> that does a sync publish call could choose to have an alternate behavior
>>> if
>>> the publish failed, whereas the code that does an async publish would
>>> never
>>> know whether the publish succeeded or not.
>>>
>>> But like I said, in both cases, you can configure the batching size at
>>> the
>>> producer level, and a batching size greater than 1 will provide you with
>>> better throughput capabilities... In fact, I think this is the canonical
>>> use case Kafka was originally built for.
>>>
>>> --
>>> Felix
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 2:24 PM, will martin <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>  My understanding is that async is not meant to be an immediate send. As
>>>> to
>>>> batching, I've not delved into the code differences.
>>>>
>>>> But batching the sync is not possible at the Producer higher level; at
>>>> least that's what I've tried and had no success with, the default and
>>>> string encoders cannot handle lists, although the documentation suggests
>>>> they can.
>>>>
>>>> I'm glad to be wrong on this; but I've had no luck with the serializer
>>>> deep
>>>> in scala code tree accepting a composite of any type containing either
>>>> Message or String.  I can batch myself, but doubt this is what any of us
>>>> think is the design goal?
>>>>
>>>>
>>>>
>>>> On Mon, Aug 20, 2012 at 1:06 PM, Felix GV <[EMAIL PROTECTED]> wrote:
>>>>
>>>>  This may not be entirely related to what you're talking about, but why
>>>>> would an async producer not be able to meet your throughput needs, and
>>>>> a
>>>>> sync producer be able to?
>>>>>
>>>>> Both sync and async producers can be configured to batch more than one
>>>>> message together, and that's pretty much the main thing that's required
>>>>>
>>>> to
>>>>
>>>>> be able to achieve good throughput, AFAIK.
>>>>>
>>>>> ...?
>>>>>
>>>>> --
>>>>> Felix
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 20, 2012 at 12:49 PM, will martin <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>  Thanks Neha. All my data is of 1 type. The serializer in place doesn't
>>>>>>
>>>>> seem
>>>>>
>>>>>> to handle an array of String.
>>>>>>
>>>>>> The ProducerData I use is a collection of same types of data wrapped
>>>>>>
>>>>> in a
>>>>
>>>>> single defintion, according to as I read spec.  Am I to understand
>>>>>>
>>>>> that,
>>>>
>>>>> having a producer batch records itself is unsupported?  The async
>>>>>>
>>>>> producer
>>>>>
>>>>>> can't meet my throughput needs and as I understand is targetted at
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB