Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Uncaught Exception When Using Spooling Directory Source

Copy link to this message
Re: Uncaught Exception When Using Spooling Directory Source
Thank you very much, Connor!! It is really HELPFUL.
On Fri, Jan 18, 2013 at 5:13 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> The Spooling Directory Source is best used for sending old data / backups
> through Flume, as opposed to trying to use it for realtime data (due to, as
> you discovered, you aren't supposed to write directly to a file in that
> directory  but rather place closed files in there). You could implement
> what Mike mentioned above about rolling the logs into the spooling
> directory, but there are other options.
> If you are looking to pull data in real time, the Exec Source<http://flume.apache.org/FlumeUserGuide.html#exec-source>mentioned above does work. The one downside with this is that this source
> is not the most reliable, as is mentioned in the red box in that link, and
> you will have to monitor it to make sure it hasn't crashed. However, other
> than the Spooling Directory source and any custom source you write, this is
> the only other pulling source.
> But depending on how your system is set up, you could set up a system for
> pushing your logs into Flume. Here are some options:
> If the log files you want to capture use Log4J, then there is a Log4JAppender
> <http://flume.apache.org/FlumeUserGuide.html#log4j-appender>which will
> send events directly to Flume. The benefit to this is that you let Flume
> take control of the events right as they are generated; they are sent
> through Avro to your specified host/ip where you will have a Flume agent
> with an Avro Source<http://flume.apache.org/FlumeUserGuide.html#flume-sources>running.
> Another alternative to the above if you don't use Log4J but you do have
> direct control over the application is to use the Embedded Flume Agent<https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst#embedded-agent>.
> This is even more powerful than the log4j appender as you have more control
> over how it works and you are able to use the Flume channels with it. This
> would end up pushing events via Avro to your Flume agent to then
> collect/process/store.
> There are a variety of network methods that can communicate with Flume.
> Flume has support for listening on a specified port with the Netcat Source<http://flume.apache.org/FlumeUserGuide.html#netcat-source>,
> getting events via HTTP Post<http://flume.apache.org/FlumeUserGuide.html#http-source>messages, and if your application uses Syslog that's
> supported <http://flume.apache.org/FlumeUserGuide.html#syslog-sources> as
> well.
> In summation, if you need to set up a pulling system you will need to
> place a Flume agent on each of your servers and have it use a Spooling
> Directory or Exec source; or if your system is configurable enough you will
> be able to modify it in various possible ways to push the logs to Flume.
> I hope some of that was helpful,
> - Connor
> On Fri, Jan 18, 2013 at 12:18 AM, Henry Ma <[EMAIL PROTECTED]>wrote:
>> We have an advertisement system, which owns hundreds of servers running
>> service such as resin/nginx, and each of them generates log files to a
>> local location every seconds. What we need is to collect all the log files
>> in time to a central storage such as MooseFS for real-time analysis, and
>> then archive them to HDFS by hour.
>> We want to deploy Flume to collect log files as soon as they are
>> generated from nearly one hundred servers (the server list may be added or
>> removed at any time) to a central location, and then archive to HDFS each
>> hour.
>> By now the log files cannot be pushed to any collecting system. We want
>> to the collecting system can PULL all of them remotely.
>> Can you give me some guide? Thanks!
>> On Fri, Jan 18, 2013 at 3:45 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>> Can you provide more detail about what kinds of services?
>>> If you roll the logs every 5 minutes or so then you can configure the
>>> spooling source to pick them up once they are rolled by either rolling them
Best Regards,
网易有道 EAD-Platform
MOBILE: 18600601996