Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> 7.1 support for List<ProducerData>


Copy link to this message
-
Re: 7.1 support for List<ProducerData>
What I meant is that Kafka has been designed first and foremost as a
high-throughput system, and it is achieving that with a couple techniques,
but mainly by batching a bunch of events together so that it can benefit
from the lesser overhead of writing sequentially (as opposed to random
access).

Whether you choose to publish synchronously or asynchronously should not
change anything to the fact that Kafka can achieve a high throughput via
batching.

--
Felix

On Mon, Aug 20, 2012 at 10:55 PM, wm <[EMAIL PROTECTED]> wrote:

> Felix. My regets for confusing the matter.  Please inform me of a primary
> source for the canonical use case you reference, unless that was scoped to
> the kafka community only. That sort of statement should be clearly
> documented imho.
>
> I am considering the matter closed with respect to this list. I have 3
> publish options each with some degree of autonomy from the calling code's
> designed behavior.
>
> regards
>
>
> On 08/20/2012 02:39 PM, Felix GV wrote:
>
>> I think the difference is merely that async publishing is a non-blocking
>> call, whereas sync publishing is a blocking call, meaning that the code
>> that does a sync publish call could choose to have an alternate behavior
>> if
>> the publish failed, whereas the code that does an async publish would
>> never
>> know whether the publish succeeded or not.
>>
>> But like I said, in both cases, you can configure the batching size at the
>> producer level, and a batching size greater than 1 will provide you with
>> better throughput capabilities... In fact, I think this is the canonical
>> use case Kafka was originally built for.
>>
>> --
>> Felix
>>
>>
>>
>> On Mon, Aug 20, 2012 at 2:24 PM, will martin <[EMAIL PROTECTED]>
>> wrote:
>>
>>  My understanding is that async is not meant to be an immediate send. As
>>> to
>>> batching, I've not delved into the code differences.
>>>
>>> But batching the sync is not possible at the Producer higher level; at
>>> least that's what I've tried and had no success with, the default and
>>> string encoders cannot handle lists, although the documentation suggests
>>> they can.
>>>
>>> I'm glad to be wrong on this; but I've had no luck with the serializer
>>> deep
>>> in scala code tree accepting a composite of any type containing either
>>> Message or String.  I can batch myself, but doubt this is what any of us
>>> think is the design goal?
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 1:06 PM, Felix GV <[EMAIL PROTECTED]> wrote:
>>>
>>>  This may not be entirely related to what you're talking about, but why
>>>> would an async producer not be able to meet your throughput needs, and a
>>>> sync producer be able to?
>>>>
>>>> Both sync and async producers can be configured to batch more than one
>>>> message together, and that's pretty much the main thing that's required
>>>>
>>> to
>>>
>>>> be able to achieve good throughput, AFAIK.
>>>>
>>>> ...?
>>>>
>>>> --
>>>> Felix
>>>>
>>>>
>>>>
>>>> On Mon, Aug 20, 2012 at 12:49 PM, will martin <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>  Thanks Neha. All my data is of 1 type. The serializer in place doesn't
>>>>>
>>>> seem
>>>>
>>>>> to handle an array of String.
>>>>>
>>>>> The ProducerData I use is a collection of same types of data wrapped
>>>>>
>>>> in a
>>>
>>>> single defintion, according to as I read spec.  Am I to understand
>>>>>
>>>> that,
>>>
>>>> having a producer batch records itself is unsupported?  The async
>>>>>
>>>> producer
>>>>
>>>>> can't meet my throughput needs and as I understand is targetted at
>>>>>
>>>> implicit
>>>>
>>>>> load balancing among different client machines.
>>>>>
>>>>> Additionally, the sync producer can meet my needs, but requires more
>>>>>
>>>> use
>>>
>>>> of
>>>>
>>>>> the lower-level design features. For maintenance, it'd be great if I
>>>>>
>>>> could
>>>>
>>>>> create a list of Strings, create a ProducerData<String, List<String>>
>>>>>
>>>> and
>>>
>>>> have this be serialized.
>>>>>
>>>>> It occurs to me that the described  serialization may need my