Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> .SpoolingFileLineReader warning....


Copy link to this message
-
Re: .SpoolingFileLineReader warning....
Hey Brock,

I can do some more testing on my side with smaller files as well as doing a
mv vs a cp . I do believe that a slight delay would be helpful since people
will be moving/copying large files around.

Regards ,

Dano
On Nov 20, 2012 5:26 AM, "Brock Noland" <[EMAIL PROTECTED]> wrote:

> Thinking about this more, I think it's probably going to be quite
> common for people to cp large files into the spooling directory.
> Patrick, what do you think about waiting until the mtime is say 1
> second old?
>
> Brock
>
> On Mon, Nov 19, 2012 at 5:29 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
> > My guess is that the file does not have the correct permissions while
> > being copied.
> >
> > [noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al
> test*
> > [1] 18780
> > -rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0
> > -rw------- 1 noland noland   52334592 Nov 19 17:27 test-1
> >
> >
> > For large files, it probably makes sense to copy the file in as .file
> > and then rename it to file.
> >
> > Brock
> >
> > On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <[EMAIL PROTECTED]>
> wrote:
> >> The spooling source gets a directory listing, then reads each file, then
> >> renames it to X.COMPLETED. Is it possible some other process deleted
> that
> >> file between when Flume listed the directory and when it tried to open
> the
> >> file? Otherwise, I'm confused why the file would not be present in the
> >> listing you give here.
> >>
> >>
> >> On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> Hey Dan,
> >>>
> >>> You say that it seems like Flume has already processed the log... why
> do
> >>> you think that?
> >>>
> >>> When you listed the directory contents I don't see the original or the
> >>> COMPLETED version of the file that Flume is complaining about:
> >>>
> >>> /clickstream.log-2012-11-17-1353163623
> >>>
> >>> doesn't appear in the
> >>>
> >>> /mnt/flume/clickstream/
> >>>
> >>> directory listing anywhere.
> >>>
> >>>
> >>> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <[EMAIL PROTECTED]>
> wrote:
> >>>>
> >>>> Hello Brock,
> >>>>
> >>>> It seems like we get this message each time that logrotate runs and
> is in
> >>>> the process of copying the file to the SpoolingDirectory. It seems
> that
> >>>> Flume starts reading the file as soon as it shows up in the
> >>>> SpoolingDirectory.....  Maybe it's trying to read the file while it's
> still
> >>>> being written to????
> >>>>
> >>>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN -
> >>>>
> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)]
> >>>> Could not find file:
> >>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
> >>>> java.io.FileNotFoundException:
> >>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
> (Permission
> >>>> denied)
> >>>> at java.io.FileInputStream.open(Native Method)
> >>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
> >>>> at java.io.FileReader.<init>(FileReader.java:72)
> >>>> at
> >>>>
> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
> >>>> at
> >>>>
> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
> >>>> at
> >>>>
> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
> >>>> at
> >>>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >>>> at
> >>>>
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> >>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> >>>> at
> >>>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> >>>> at
> >>>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> >>>> at
> >>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)