Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Import files from a directory on remote machine


Copy link to this message
-
Re: Import files from a directory on remote machine
Jeff Lord 2014-04-23, 15:15
Hi Otis,

This is pretty clearly stated in the docs.
For production we would typically recommend the spooling directory source
as an alternative.

http://flume.apache.org/FlumeUserGuide.html#exec-source

"Warning The problem with ExecSource and other asynchronous sources is that
the source can not guarantee that if there is a failure to put the event
into the Channel the client knows about it. In such cases, the data will be
lost. As a for instance, one of the most commonly requested features is the
tail -F [file]-like use case where an application writes to a log file on
disk and Flume tails the file, sending each line as an event. While this is
possible, there's an obvious problem; what happens if the channel fills up
and Flume can't send an event? Flume has no way of indicating to the
application writing the log file that it needs to retain the log or that
the event hasn't been sent, for some reason. If this doesn't make sense,
you need only know this: Your application can never guarantee data has been
received when using a unidirectional asynchronous interface such as
ExecSource! As an extension of this warning - and to be completely clear -
there is absolutely zero guarantee of event delivery when using this
source. For stronger reliability guarantees, consider the Spooling
Directory Source or direct integration with Flume via the SDK."

-Jeff
On Wed, Apr 23, 2014 at 6:48 AM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote: