Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Of BatchSize / Channel Capacity / Transaction Capacity


+
Bhaskar V. Karambelkar 2013-01-08, 18:46
+
Jeff Lord 2013-01-09, 02:40
+
Bhaskar V. Karambelkar 2013-01-11, 14:48
Copy link to this message
-
Re: Of BatchSize / Channel Capacity / Transaction Capacity
Bhaskar,

I have created the following jira for this:
https://issues.apache.org/jira/browse/FLUME-1829

-Jeff
On Fri, Jan 11, 2013 at 6:48 AM, Bhaskar V. Karambelkar <[EMAIL PROTECTED]
> wrote:

> Thanks Jeff,
> Clear and detailed explanations. These deserve to be on the wiki, as these
> parameters have direct implications on the performance of flume nodes.
>
> thanks
> Bhaskar
>
>
> On Tue, Jan 8, 2013 at 9:40 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:
>
>> Hi Bashkar,
>>
>> 1) Batch Size
>>   1.a) When configured by client code using the flume-core-sdk , to send
>> events to flume avro source.
>>  The flume client sdk has an appendBatch method. This will take a list of
>> events and send them to the source as a batch. This is the size of the
>> number of events to be passed to the source at one time.
>>
>>   1.b) When set as a parameter on HDFS sink (or other sinks which support
>> BatchSize parameter)
>> This is the number of events written to file before it is flushed to HDFS
>>
>> 2)
>>   2.a) Channel Capacity
>> This is the maximum capacity number of events of the channel.
>>
>>   2.b) Channel Transaction Capacity.
>> This is the max number of events stored in the channel per transaction.
>>
>> How will setting these parameters to different values, affect throughput,
>> latency in event flow?
>>
>> In general you will see better throughput by using memory channel as
>> opposed to using file channel at the loss of durability.
>>
>> The channel capacity is going to need to be sized such that it is large
>> enough to hold as many events as will be added to it by upstream agents.
>> Ideal flow would see the sink draining events from the channel faster than
>> it is having events added by its source.
>>
>> The channel transaction capacity will need to be smaller than the channel
>> capacity.
>> e.g. If your Channel capacity is set to 10000 than Channel Transaction
>> Capacity should be set to something like 100.
>>
>> Specifically if we have clients with varying frequency of event
>> generation, i.e. some clients generating thousands of events/sec, while
>> others at a much slower rate, what effect will different values of these
>> params have on these clients ?
>>
>> Transaction Capacity is going to be what throttles or limits how many
>> events the source can put into the channel. This going to vary depending on
>> how many tiers of agents/collectors you have setup.
>> In general though this should probably be equal to whatever you have the
>> batch size set to in your client.
>>
>> With regards to the hdfs batch size, the larger your batch size the
>> better performance will be. However, keep in mind that if a transaction
>> fails the entire transaction will be replayed which could have the
>> implication of duplicate events downstream.
>>
>> -Jeff
>>
>>
>>
>>
>> On Tue, Jan 8, 2013 at 10:46 AM, Bhaskar V. Karambelkar <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Can some one explain the importance of the following
>>> 1) Batch Size
>>>   1.a) When configured by client code using the flume-core-sdk , to send
>>> events to flume avro source.
>>>   1.b) When set as a parameter on HDFS sink (or other sinks which
>>> support BatchSize parameter)
>>> 2)
>>>   2.a) Channel Capacity
>>>   2.b) Channel Transaction Capacity.
>>>
>>>
>>> Under which conditions should these params be set to high values, and
>>> under which conditions should they be set to low values.
>>>
>>>
>>> How will setting these parameters to different values, affect
>>> throughput, latency in event flow.
>>> Specifically if we have clients with varying frequency of event
>>> generation, i.e. some clients generating thousands of events/sec, while
>>> others at a much slower rate, what effect will different values of these
>>> params have on these clients ?
>>>
>>> thanks
>>> Bhaskar
>>>
>>
>>
>
+
Alexander Alten-Lorenz 2013-01-12, 09:05