Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> .SpoolingFileLineReader warning....


Copy link to this message
-
Re: .SpoolingFileLineReader warning....
Coolio, thank you Brock.

Did a quick test with a mv vs cp.....this is one test, repeated a few
times....the smaller file (~60M) seemed to work fine w/ a mv, but a larger
file we're seeing the same behavior....
Start with no logs in the SpoolingDirectory:

ls -lrt /mnt/flume/clickstream

/mnt/flume/clickstream:
total 0

Review the logs that will be rotated via logrotate.d into the respective
directory; /mnt/flume/clickstream

ls -lrt /var/log/clickstream
/var/log/clickstream:
total 64112
-rw-rw-r-- 1 ubuntu ubuntu 65648336 Nov 20 16:05 clickstream.log

Review logrotate config in /etc/logrotate.d. Note here, I changed from cp
-p to a mv.....

/var/log/clickstream/clickstream.log
{
  missingok
  rotate 3
  compress
  delaycompress
  copytruncate
  notifempty
  size 50M
  dateext
  dateformat -%Y-%m-%d-%s
  create 666 ubuntu ubuntu
  postrotate
  mv $1 /mnt/flume/clickstream/ 2>&1
  endscript
}
I run logrotate.d/clickstream.POST with the ~60MB file, and everything
looked fine....now I try a ~190MB file....

ls -lrt /var/log/clickstream
/var/log/clickstream:
total 192336
-rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 20 16:42 clickstream.log
Run logrotate.d/clickstream.POST, and we see the WARNING in the FLume log.

....
....
20 Nov 2012 16:45:07,117 WARN  [pool-13-thread-1]
(org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328)  -
Could not find file:
/mnt/flume/clickstream/clickstream.log-2012-11-20-1353429906
java.io.FileNotFoundException:
/mnt/flume/clickstream/clickstream.log-2012-11-20-1353429906 (Permission
denied)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileReader.<init>(FileReader.java:72)
at
org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
at
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
20 Nov 2012 16:45:13,174 INFO  [hdfs-c1s3-call-runner-5]
(org.apache.flume.sink.hdfs.BucketWriter.doOpen:205)  - Creating
s3n://X:Y@my-bucket
/clicks/2012/11/clicks-2012-11-20-16-40-10.145.184.200-.1353429913104.gz.tmp
....
....

But it seems that FLume did process the log.....

ls -lrt /mnt/flume/clickstream/
total 256704
-rw-rw-r-- 1 ubuntu ubuntu  65648336 Nov 20 16:25
clickstream.log-2012-11-20-1353428715.COMPLETED
-rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 20 16:45
clickstream.log-2012-11-20-1353429906.COMPLETED
Regards,

Dano

On Tue, Nov 20, 2012 at 9:21 AM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Yeah I think that makes sense, I have created a JIRA for this
>
> https://issues.apache.org/jira/browse/FLUME-1733
>
> Brock
>
> On Tue, Nov 20, 2012 at 9:02 AM, Dan Young <[EMAIL PROTECTED]> wrote:
> > Hey Brock,
> >
> > I can do some more testing on my side with smaller files as well as
> doing a
> > mv vs a cp . I do believe that a slight delay would be helpful since
> people
> > will be moving/copying large files around.
> >
> > Regards ,
> >
> > Dano
> >
> > On Nov 20, 2012 5:26 AM, "Brock Noland" <[EMAIL PROTECTED]> wrote:
> >>
> >> Thinking about this more, I think it's probably going to be quite
> >> common for people to cp large files into the spooling directory.
> >> Patrick, what do you think about waiting until the mtime is say 1