Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Guarantees of the memory channel for delivering to sink


Copy link to this message
-
Re: Guarantees of the memory channel for delivering to sink
Roshan Naik 2012-11-07, 22:57
Rahul,

If we choose to use file channel with this source, we will result in double
> writes to disk, correct? (one for the legacy log files which will be
> ingested by the Spool Directory source, and the other for the WAL)
>
>
Yes that will lead to double disk writes if you go with file channel. For
your use case, i am thinking, you may go for the memory channel instead if
you live with "small" data loss. To mitigate data loss having a smaller
size memory channel will help.  For this to work reasonably well, the
source would need the ability to resume (on restart) from the last event
it committed into the channel. The amount of data loss would be limited to
your memory channel's capacity and you will avoid double disk I/O.

 I dont know if the Spool Directory source knows precisely where to resume
from after a restart (following a crash).  Brock ?
-roshan