Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Uncaught Exception When Using Spooling Directory Source


Copy link to this message
-
Re: Uncaught Exception When Using Spooling Directory Source
We have an advertisement system, which owns hundreds of servers running
service such as resin/nginx, and each of them generates log files to a
local location every seconds. What we need is to collect all the log files
in time to a central storage such as MooseFS for real-time analysis, and
then archive them to HDFS by hour.

We want to deploy Flume to collect log files as soon as they are generated
from nearly one hundred servers (the server list may be added or removed at
any time) to a central location, and then archive to HDFS each hour.

By now the log files cannot be pushed to any collecting system. We want to
the collecting system can PULL all of them remotely.

Can you give me some guide? Thanks!
On Fri, Jan 18, 2013 at 3:45 PM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Can you provide more detail about what kinds of services?
>
> If you roll the logs every 5 minutes or so then you can configure the
> spooling source to pick them up once they are rolled by either rolling them
> into a directory for immutable files or using the trunk version of the
> spooling file source to specify a filter to ignore files that don't match a
> "rolled" pattern.
>
> You could also use exec source with "tail -F" but that is much more
> unreliable than the spooling file source.
>
> Regards,
> Mike
>
>
> On Thu, Jan 17, 2013 at 10:23 PM, Henry Ma <[EMAIL PROTECTED]>wrote:
>
>> OK, thank you very much, now I know why the problem occurs.
>>
>> I am a new comer of Flume. Here is my scenario: using Flume to collecting
>> from hundreds of directories from dozens of servers to a central storage.
>> It seems that spooling directory source may not be the best choice. Can
>> someone give me some advice about how to design the architecture? Which
>> type of source and sink can fit?
>>
>> Thanks!
>>
>>
>> On Fri, Jan 18, 2013 at 2:05 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Henry,
>>> The files must be immutable before putting them into the spooling
>>> directory. So if you copy them from a different file system then you can
>>> run into this issue. The right way to do it is to copy them to the same
>>> file system and then atomically move them into the spooling directory.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>> On Thu, Jan 17, 2013 at 9:59 PM, Henry Ma <[EMAIL PROTECTED]>wrote:
>>>
>>>> Thank you very much! I clean all the related dir and restart again. I
>>>> keep the source spooling dir empty, then start Flume, and then put some
>>>> file into the spooling dir. But this time a new error occured:
>>>>
>>>> 13/01/18 13:44:24 INFO avro.SpoolingFileLineReader: Preparing to move
>>>> file
>>>> /disk2/mahy/FLUME_TEST/source/sspstat.log.20130118112700-20130118112800.hs016.ssp
>>>> to /disk2/mahy/FLUME_TEST/
>>>> source/sspstat.log.20130118112700-20130118112800.hs016.ssp.COMPLETED
>>>> 13/01/18 13:44:24 ERROR source.SpoolDirectorySource: Uncaught exception
>>>> in Runnable
>>>> java.lang.IllegalStateException: File has changed size since being
>>>> read:
>>>> /disk2/mahy/FLUME_TEST/source/sspstat.log.20130118112700-20130118112800.hs016.ssp
>>>>         at
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineReader.java:241)
>>>>         at
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:185)
>>>>         at
>>>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>>>>         at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>>>         at
>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>>>         at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
Best Regards,
马环宇
网易有道 EAD-Platform
POPO:   [EMAIL PROTECTED]
MSN:    [EMAIL PROTECTED]
MOBILE: 18600601996
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB