Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> spoolDir source problem


+
Paul Chavez 2013-04-11, 22:15
+
Paul Chavez 2013-04-12, 18:41
+
Israel Ekpo 2013-04-12, 20:14
+
Paul Chavez 2013-04-12, 20:37
+
Israel Ekpo 2013-04-12, 20:42
Copy link to this message
-
RE: spoolDir source problem
We already have a CentOS cluster running half a dozen flume nodes, we've been feeding it production data for about 6 months and we've been very pleased with it so far. We are just looking to get agents on our app servers to smooth out cluster upgrades.
Thanks for your help,
Paul

________________________________
From: Israel Ekpo [mailto:[EMAIL PROTECTED]]
Sent: Friday, April 12, 2013 1:42 PM
To: [EMAIL PROTECTED]
Subject: Re: spoolDir source problem

It might be a good idea to set up Ubuntu 12 on a virtual machine using Virtual box and then set up your test environment there.

This will give you some confidence that the set up works before you deploy it

I dont really use Windows for development so unfortunately I am not able to help you troubleshoot this.

On 12 April 2013 16:37, Paul Chavez <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
1. Flume 1.3.1 I believe, whatever is packaged with latest CDH distribution.
2. Windows Server 2008 R2
3. The meta files are created by the flume agent, so should have full rights. I'm went through and recreated the spool directory with more explicit permissions now. It wasn't clear from the exception if the issue was with the meta files or the files I'm putting in the spool dir. Unfortunately it didn't seem to have an effect, recreated the directory with full access for everyone and same issue.

I'm ok with not having this functionality on Windows, just don't want to waste time on a solution that won't work. My current solution uses the Avro client to send files to a flume agent on our HDFS cluster running an avro source. The main reason I want a local Windows agent is for the HTTP Source which I've already been able to verify as working.

Thanks,
Paul
________________________________
From: Israel Ekpo [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Friday, April 12, 2013 1:15 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: spoolDir source problem

Paul,

I have the following questions:

(1) What version of Flume are you using?

(2) What version of Windows are you using?

(3) Does the user running Flume have permissions to read/write in the directories used for the spooling and channels?
This will help narrow down the reasons why this could be happening.

Nevertheless, it looks like the issue you are encountering is platform specific (just on Windows)

>From your log messages, it appears the class in the calling thread is org.apache.flume.client.avro.ReliableSpoolingFileEventReader

However, the problem is happening in org.apache.flume.serialization.DurablePositionTracker.getInstance()

Within the source code, there is a comment on line 94 in the file stating that on Windows renames is not really stable and the logic is not atomic.

There is also a recommendation for implementing a recovery procedure so that if the file does not exist on startup, it will check for a rolled version before attempting to create a brand new file.

If it is possible for you to move to a different environment other than Windows, that would be great.

If this is not possible, then try deleting your spooling directory "c:\flume_data\spool\web" which will also remove the metadata files recursively.

Back up all the pending files that have not yet been processed in the spooling directory before deleting the folder so that you can put the files back after the directory is recreated.

Then restart your agent to see if this works.

Let me know if this helps.

On 12 April 2013 14:41, Paul Chavez <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Anyone have any ideas on this? I can't even find the class throwing the exception to try and see what it is doing. I would really like to use this on Windows, but would like to know at least if there's some compatibility issue so I can move on.

thanks,
Paul
________________________________
From: Paul Chavez [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Thursday, April 11, 2013 3:15 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: spoolDir source problem

Hello,

I've run into a problem with the spoolDir source, on Windows, and am not sure how to proceed.

The agent starts fine and the source is created without issue and is apparently ready. After agent start a .flumespool directory is created in the path the source is watching. This directory remains empty as long as the agent is idle.

However, as soon as I drop a file into the spool directory (parent to the .flumespool directory), I get a series of errors in the flume log and a file named '.flumespool-main.meta<string of numbers>.tmp' is created in that .flumespool directory at the rate of one per second. The file in the spool directory is never touched as far as I can tell and the /metrics web page shows no movement on the channel or sink. A possibly related note is that the sources don't show in the metrics page, even though the logs say the source(s) are started.

All I have done so far is set the directory security to be 'Everyone/Full Control', basically the windows version of 'chmod 777'

Any help is appreciated!

thanks,
Paul

Here's what the log shows.
11 Apr 2013 15:11:48,092 INFO  [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184)  - Starting Source spool_WebLogs
11 Apr 2013 15:11:48,092 INFO  [conf-file-poller-0] (org.apache.flume.node.Application.startAllComponents:184)  - Starting Source http_Default
11 Apr 2013 15:11:48,092 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.source.SpoolDirectorySource.start:66)  - SpoolDirectorySource source starting with directory: c:\flume_data\spool\web
11 Apr 2013 15:11:48,124 INFO  [conf-file-poller-0] (org.mortbay.log.Slf4jLog.info:67<http://org.mortbay.log.Slf4jLog.info:67>)  - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
11 Apr 2013 15:11:48,139 INFO  [conf-file-poller-0] (org.mortbay.log.Slf4jLog
+
Nitin Pawar 2013-04-12, 20:52
+
Paul Chavez 2013-04-12, 21:04
+
Paul Chavez 2013-04-12, 21:22
+
Paul Chavez 2013-04-12, 23:24
+
Israel Ekpo 2013-04-13, 01:15
+
Paul Chavez 2013-04-15, 18:37
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB