-Re: Of BatchSize / Channel Capacity / Transaction Capacity
Bhaskar V. Karambelkar 2013-01-11, 14:48
Clear and detailed explanations. These deserve to be on the wiki, as these
parameters have direct implications on the performance of flume nodes.
On Tue, Jan 8, 2013 at 9:40 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:
> Hi Bashkar,
> 1) Batch Size
> 1.a) When configured by client code using the flume-core-sdk , to send
> events to flume avro source.
> The flume client sdk has an appendBatch method. This will take a list of
> events and send them to the source as a batch. This is the size of the
> number of events to be passed to the source at one time.
> 1.b) When set as a parameter on HDFS sink (or other sinks which support
> BatchSize parameter)
> This is the number of events written to file before it is flushed to HDFS
> 2.a) Channel Capacity
> This is the maximum capacity number of events of the channel.
> 2.b) Channel Transaction Capacity.
> This is the max number of events stored in the channel per transaction.
> How will setting these parameters to different values, affect throughput,
> latency in event flow?
> In general you will see better throughput by using memory channel as
> opposed to using file channel at the loss of durability.
> The channel capacity is going to need to be sized such that it is large
> enough to hold as many events as will be added to it by upstream agents.
> Ideal flow would see the sink draining events from the channel faster than
> it is having events added by its source.
> The channel transaction capacity will need to be smaller than the channel
> e.g. If your Channel capacity is set to 10000 than Channel Transaction
> Capacity should be set to something like 100.
> Specifically if we have clients with varying frequency of event
> generation, i.e. some clients generating thousands of events/sec, while
> others at a much slower rate, what effect will different values of these
> params have on these clients ?
> Transaction Capacity is going to be what throttles or limits how many
> events the source can put into the channel. This going to vary depending on
> how many tiers of agents/collectors you have setup.
> In general though this should probably be equal to whatever you have the
> batch size set to in your client.
> With regards to the hdfs batch size, the larger your batch size the better
> performance will be. However, keep in mind that if a transaction fails the
> entire transaction will be replayed which could have the implication of
> duplicate events downstream.
> On Tue, Jan 8, 2013 at 10:46 AM, Bhaskar V. Karambelkar <
> [EMAIL PROTECTED]> wrote:
>> Can some one explain the importance of the following
>> 1) Batch Size
>> 1.a) When configured by client code using the flume-core-sdk , to send
>> events to flume avro source.
>> 1.b) When set as a parameter on HDFS sink (or other sinks which support
>> BatchSize parameter)
>> 2.a) Channel Capacity
>> 2.b) Channel Transaction Capacity.
>> Under which conditions should these params be set to high values, and
>> under which conditions should they be set to low values.
>> How will setting these parameters to different values, affect throughput,
>> latency in event flow.
>> Specifically if we have clients with varying frequency of event
>> generation, i.e. some clients generating thousands of events/sec, while
>> others at a much slower rate, what effect will different values of these
>> params have on these clients ?