Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Re: spool directorty configiration problem


Copy link to this message
-
Re: spool directorty configiration problem
Hello Venkat,

Your question is more appropriate for the users mailing list so I have
changed the list in this reply.

Going forward, you can use the following as a guide when sending emails to
the lists:

For questions about how to use or configure Apache FLUME or if you are
experiencing issues using Apache Flume, such queries should be sent to the
user mailing list ([EMAIL PROTECTED])

For questions about API internals, patches and code reviews, such queries
are more appropriate for the developer mailing list ([EMAIL PROTECTED])

Coming back to the issue you reported, I have had this problem before in my
early days with Flume.

The root cause of your problem is in the log files you included in your
message.

You cannot set the spooling directory to one where the files and constantly
being updated.

If the files have been updated after they are picked up by Flume in the
spooling directory you are going to encounter an Exception.

You can check out the user gude for more info.

http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

>From the user guide, the SpoolingDirectorySource expects that only
immutable, uniquely named files are dropped in the spooling directory. If
duplicate names are used, or files are modified while being read, the
source will fail with an error message. For some use cases this may require
adding unique identifiers (such as a timestamp) to log file names when they
are copied into the spooling directory.

On 23 April 2013 01:17, Venkateswarlu Danda <
[EMAIL PROTECTED]> wrote:

> Hello
>
> I am generating files continuously in local folder of my base machine. How
> I can now use the flume to stream the generated files from local folder to
> HDFS.
>
> I have written some configuration but its giving some issues ,please give
> me some sample code configuration
>
> This is my configratin  file
>
> agents.sources=spooldir-source
> agents.sinks=hdfs-sink
> agents.channels=ch1
>
> agents.sources.spooldir-source.type=spooldir
>
> agents.sources.spooldir-source.spoolDir=/apache-tomcat-7.0.39/logs/MultiThreadLogs
> agents.sources.spooldir-source.fileSuffix=.SPOOL
> agents.sources.spooldir-source.fileHeader=true
> agents.sources.spooldir-source.bufferMaxLineLength=50000
>
> agents.sinks.hdfs-sink.type=hdfs
> agents.sinks.hdfs-sink.hdfs.path=hdfs://cloudx-740-677:54300/multipleFiles/
> agents.sinks.hdfs-sink.hdfs.rollSize=12553700
> agents.sinks.hdfs-sink.hdfs.rollCount=12553665
> agents.sinks.hdfs-sink.hdfs.rollInterval=3000
> agents.sinks.hdfs-sink.hdfs.fileType=DataStream
> agents.sinks.hdfs-sink.hdfs.writeFormat=Text
>
> agents.channels.ch1.type=file
>
> agents.sources.spooldir-source.channels=ch1
> agents.sinks.hdfs-sink.channel=ch1
>
>
>
> If I adding a large files (10Mb) , getting error
>
>
> 13/04/18 16:11:21 ERROR source.SpoolDirectorySource: Uncaught exception in
> Runnable
> java.lang.IllegalStateException: File has been modified since being read:
> /apache-tomcat-7.0.39/logs/MultiThreadLogs/log_0.txt
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile(SpoolingFileLineReader.java:237)
>         at
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:185)
>         at
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)