Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> .SpoolingFileLineReader warning....


+
Dan Young 2012-11-17, 15:02
+
Brock Noland 2012-11-17, 15:57
+
Dan Young 2012-11-17, 16:01
+
Brock Noland 2012-11-17, 16:15
+
Dan Young 2012-11-17, 16:33
+
Dan Young 2012-11-19, 19:33
+
Patrick Wendell 2012-11-19, 23:03
+
Patrick Wendell 2012-11-19, 23:04
+
Brock Noland 2012-11-19, 23:29
+
Brock Noland 2012-11-20, 12:25
+
Dan Young 2012-11-20, 15:02
+
Brock Noland 2012-11-20, 16:21
Copy link to this message
-
Re: .SpoolingFileLineReader warning....
Coolio, thank you Brock.

Did a quick test with a mv vs cp.....this is one test, repeated a few
times....the smaller file (~60M) seemed to work fine w/ a mv, but a larger
file we're seeing the same behavior....
Start with no logs in the SpoolingDirectory:

ls -lrt /mnt/flume/clickstream

/mnt/flume/clickstream:
total 0

Review the logs that will be rotated via logrotate.d into the respective
directory; /mnt/flume/clickstream

ls -lrt /var/log/clickstream
/var/log/clickstream:
total 64112
-rw-rw-r-- 1 ubuntu ubuntu 65648336 Nov 20 16:05 clickstream.log

Review logrotate config in /etc/logrotate.d. Note here, I changed from cp
-p to a mv.....

/var/log/clickstream/clickstream.log
{
  missingok
  rotate 3
  compress
  delaycompress
  copytruncate
  notifempty
  size 50M
  dateext
  dateformat -%Y-%m-%d-%s
  create 666 ubuntu ubuntu
  postrotate
  mv $1 /mnt/flume/clickstream/ 2>&1
  endscript
}
I run logrotate.d/clickstream.POST with the ~60MB file, and everything
looked fine....now I try a ~190MB file....

ls -lrt /var/log/clickstream
/var/log/clickstream:
total 192336
-rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 20 16:42 clickstream.log
Run logrotate.d/clickstream.POST, and we see the WARNING in the FLume log.

....
....
20 Nov 2012 16:45:07,117 WARN  [pool-13-thread-1]
(org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile:328)  -
Could not find file:
/mnt/flume/clickstream/clickstream.log-2012-11-20-1353429906
java.io.FileNotFoundException:
/mnt/flume/clickstream/clickstream.log-2012-11-20-1353429906 (Permission
denied)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileReader.<init>(FileReader.java:72)
at
org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
at
org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
at
org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
20 Nov 2012 16:45:13,174 INFO  [hdfs-c1s3-call-runner-5]
(org.apache.flume.sink.hdfs.BucketWriter.doOpen:205)  - Creating
s3n://X:Y@my-bucket
/clicks/2012/11/clicks-2012-11-20-16-40-10.145.184.200-.1353429913104.gz.tmp
....
....

But it seems that FLume did process the log.....

ls -lrt /mnt/flume/clickstream/
total 256704
-rw-rw-r-- 1 ubuntu ubuntu  65648336 Nov 20 16:25
clickstream.log-2012-11-20-1353428715.COMPLETED
-rw-rw-r-- 1 ubuntu ubuntu 196945008 Nov 20 16:45
clickstream.log-2012-11-20-1353429906.COMPLETED
Regards,

Dano

On Tue, Nov 20, 2012 at 9:21 AM, Brock Noland <[EMAIL PROTECTED]> wrote:

> Yeah I think that makes sense, I have created a JIRA for this
>
> https://issues.apache.org/jira/browse/FLUME-1733
>
> Brock
>
> On Tue, Nov 20, 2012 at 9:02 AM, Dan Young <[EMAIL PROTECTED]> wrote:
> > Hey Brock,
> >
> > I can do some more testing on my side with smaller files as well as
> doing a
> > mv vs a cp . I do believe that a slight delay would be helpful since
> people
> > will be moving/copying large files around.
> >
> > Regards ,
> >
> > Dano
> >
> > On Nov 20, 2012 5:26 AM, "Brock Noland" <[EMAIL PROTECTED]> wrote:
> >>
> >> Thinking about this more, I think it's probably going to be quite
> >> common for people to cp large files into the spooling directory.
> >> Patrick, what do you think about waiting until the mtime is say 1
+
Brock Noland 2012-11-20, 17:01
+
Dan Young 2012-11-20, 17:10
+
Brock Noland 2012-11-20, 17:14
+
Dan Young 2012-11-20, 17:17
+
Dan Young 2012-11-20, 20:03
+
Brock Noland 2012-11-20, 20:06
+
Patrick Wendell 2012-11-23, 12:46
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB