Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka, mail # user - 7.1 support for List<ProducerData>


Copy link to this message
-
Re: 7.1 support for List<ProducerData>
Felix GV 2012-08-22, 15:07
As for scalability being a fundamental aspect of Kafka's design and
implementation, besides the design doc, I guess this would be another
primary reference...

http://www.youtube.com/watch?v=Eq3i2m8aJBI

It's a pretty interesting video that touches on many aspects of Kafka, not
just scalability :)

--
Felix

On Tue, Aug 21, 2012 at 10:41 AM, Felix GV <[EMAIL PROTECTED]> wrote:

> What I meant is that Kafka has been designed first and foremost as a
> high-throughput system, and it is achieving that with a couple techniques,
> but mainly by batching a bunch of events together so that it can benefit
> from the lesser overhead of writing sequentially (as opposed to random
> access).
>
> Whether you choose to publish synchronously or asynchronously should not
> change anything to the fact that Kafka can achieve a high throughput via
> batching.
>
> --
> Felix
>
>
>
>
> On Mon, Aug 20, 2012 at 10:55 PM, wm <[EMAIL PROTECTED]> wrote:
>
>> Felix. My regets for confusing the matter.  Please inform me of a primary
>> source for the canonical use case you reference, unless that was scoped to
>> the kafka community only. That sort of statement should be clearly
>> documented imho.
>>
>> I am considering the matter closed with respect to this list. I have 3
>> publish options each with some degree of autonomy from the calling code's
>> designed behavior.
>>
>> regards
>>
>>
>> On 08/20/2012 02:39 PM, Felix GV wrote:
>>
>>> I think the difference is merely that async publishing is a non-blocking
>>> call, whereas sync publishing is a blocking call, meaning that the code
>>> that does a sync publish call could choose to have an alternate behavior
>>> if
>>> the publish failed, whereas the code that does an async publish would
>>> never
>>> know whether the publish succeeded or not.
>>>
>>> But like I said, in both cases, you can configure the batching size at
>>> the
>>> producer level, and a batching size greater than 1 will provide you with
>>> better throughput capabilities... In fact, I think this is the canonical
>>> use case Kafka was originally built for.
>>>
>>> --
>>> Felix
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 2:24 PM, will martin <[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>  My understanding is that async is not meant to be an immediate send. As
>>>> to
>>>> batching, I've not delved into the code differences.
>>>>
>>>> But batching the sync is not possible at the Producer higher level; at
>>>> least that's what I've tried and had no success with, the default and
>>>> string encoders cannot handle lists, although the documentation suggests
>>>> they can.
>>>>
>>>> I'm glad to be wrong on this; but I've had no luck with the serializer
>>>> deep
>>>> in scala code tree accepting a composite of any type containing either
>>>> Message or String.  I can batch myself, but doubt this is what any of us
>>>> think is the design goal?
>>>>
>>>>
>>>>
>>>> On Mon, Aug 20, 2012 at 1:06 PM, Felix GV <[EMAIL PROTECTED]> wrote:
>>>>
>>>>  This may not be entirely related to what you're talking about, but why
>>>>> would an async producer not be able to meet your throughput needs, and
>>>>> a
>>>>> sync producer be able to?
>>>>>
>>>>> Both sync and async producers can be configured to batch more than one
>>>>> message together, and that's pretty much the main thing that's required
>>>>>
>>>> to
>>>>
>>>>> be able to achieve good throughput, AFAIK.
>>>>>
>>>>> ...?
>>>>>
>>>>> --
>>>>> Felix
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 20, 2012 at 12:49 PM, will martin <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>  Thanks Neha. All my data is of 1 type. The serializer in place doesn't
>>>>>>
>>>>> seem
>>>>>
>>>>>> to handle an array of String.
>>>>>>
>>>>>> The ProducerData I use is a collection of same types of data wrapped
>>>>>>
>>>>> in a
>>>>
>>>>> single defintion, according to as I read spec.  Am I to understand
>>>>>>
>>>>> that,
>>>>
>>>>> having a producer batch records itself is unsupported?  The async
>>>>>>
>>>>> producer
>>>>>
>>>>>> can't meet my throughput needs and as I understand is targetted at