Flume, mail # user - Re: checkpoint lifecycle - 2014-01-30, 14:40
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: checkpoint lifecycle
On Thu, Jan 30, 2014 at 8:16 AM, Umesh Telang <[EMAIL PROTECTED]>wrote:
That is not enough heap for 150M events. It's 150 million * 32 bytes =
4.5GB + say 100-500MB for the rest of Flume.

File channel at present cannot utilize an entire disk from a IO
perspective, that is why I suggest multiple disks. Of course you'll want to
ensure that you have enough disk to support a full channel, but that is a
different discussion (avg event size * channel size).

Automatic. If flume is killed or shutdown during a checkpoint that
checkpoint is invalid and unless a backup checkpoint exists a full replay
will have to take place. Furthermore, without FLUME-2155 full replays are
very time consuming under certain conditions.

It's not purely about channel size. Specifically it's about:

1) Large channel size
2) Having a large number of events in your channel (queue depth)
3) Having run the channel for some time such that old WAL's were cleaned up
(causing there to be removes for which no event exists)
4) Performing a full replay in these conditions

Generally I wouldn't go over a 1M channel size without backup checkpoint,
this change, or both. There are more details here:

https://issues.apache.org/jira/browse/FLUME-2155?focusedCommentId=13841465&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13841465

Brock

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB