Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> .SpoolingFileLineReader warning....


Copy link to this message
-
Re: .SpoolingFileLineReader warning....
Thinking about this more, I think it's probably going to be quite
common for people to cp large files into the spooling directory.
Patrick, what do you think about waiting until the mtime is say 1
second old?

Brock

On Mon, Nov 19, 2012 at 5:29 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
> My guess is that the file does not have the correct permissions while
> being copied.
>
> [noland@localhost cp-test]$ cp -p test-0 test-1 & sleep 0.1; ls -al test*
> [1] 18780
> -rw-rw-r-- 1 noland noland 1048576000 Nov 19 17:25 test-0
> -rw------- 1 noland noland   52334592 Nov 19 17:27 test-1
>
>
> For large files, it probably makes sense to copy the file in as .file
> and then rename it to file.
>
> Brock
>
> On Mon, Nov 19, 2012 at 5:04 PM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
>> The spooling source gets a directory listing, then reads each file, then
>> renames it to X.COMPLETED. Is it possible some other process deleted that
>> file between when Flume listed the directory and when it tried to open the
>> file? Otherwise, I'm confused why the file would not be present in the
>> listing you give here.
>>
>>
>> On Mon, Nov 19, 2012 at 6:03 PM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
>>>
>>> Hey Dan,
>>>
>>> You say that it seems like Flume has already processed the log... why do
>>> you think that?
>>>
>>> When you listed the directory contents I don't see the original or the
>>> COMPLETED version of the file that Flume is complaining about:
>>>
>>> /clickstream.log-2012-11-17-1353163623
>>>
>>> doesn't appear in the
>>>
>>> /mnt/flume/clickstream/
>>>
>>> directory listing anywhere.
>>>
>>>
>>> On Mon, Nov 19, 2012 at 2:33 PM, Dan Young <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hello Brock,
>>>>
>>>> It seems like we get this message each time that logrotate runs and is in
>>>> the process of copying the file to the SpoolingDirectory. It seems that
>>>> Flume starts reading the file as soon as it shows up in the
>>>> SpoolingDirectory.....  Maybe it's trying to read the file while it's still
>>>> being written to????
>>>>
>>>> 2012-11-19 19:27:27,924 (pool-12-thread-1) [WARN -
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:328)]
>>>> Could not find file:
>>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239
>>>> java.io.FileNotFoundException:
>>>> /mnt/flume/clickstream2/clickstream2.log-2012-11-19-1353353239 (Permission
>>>> denied)
>>>> at java.io.FileInputStream.open(Native Method)
>>>> at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>> at java.io.FileReader.<init>(FileReader.java:72)
>>>> at
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.getNextFile(SpoolingFileLineReader.java:322)
>>>> at
>>>> org.apache.flume.client.avro.SpoolingFileLineReader.readLines(SpoolingFileLineReader.java:172)
>>>> at
>>>> org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:135)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at
>>>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>> at java.lang.Thread.run(Thread.java:722)
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Nov 17, 2012 at 9:15 AM, Brock Noland <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Ok, do you mind sharing your log rotate config to see if we can
>>>>> reproduce?
>>>>>
>>>>> --
>>>>> Brock Noland
>>>>> Sent with Sparrow
>>>>>
>>>>> On Saturday, November 17, 2012 at 10:01 AM, Dan Young wrote:

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/