Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume not moving data to HDFS or local


+
Siddharth Tiwari 2013-10-31, 18:52
+
Paul Chavez 2013-10-31, 19:19
+
Siddharth Tiwari 2013-10-31, 19:29
+
Siddharth Tiwari 2013-10-31, 19:46
+
Paul Chavez 2013-10-31, 21:38
+
Siddharth Tiwari 2013-11-01, 02:05
Copy link to this message
-
RE: Flume not moving data to HDFS or local
Here's a piece of my app server configuration. It's for IIS logs and has an interceptor to pull a timestamp out of the event data. It's backed by a fileChannel and I drop files into the spool directory once a minute.

# SpoolDir source for Weblogs
appserver.sources.spool_WebLogs.type = spooldir
appserver.sources.spool_WebLogs.spoolDir = c:\\flume_data\\spool\\web
appserver.sources.spool_WebLogs.channels = fc_WebLogs
appserver.sources.spool_WebLogs.batchSize = 1000
appserver.sources.spool_WebLogs.bufferMaxLines = 1200
appserver.sources.spool_WebLogs.bufferMaxLineLength = 5000

appserver.sources.spool_WebLogs.interceptors = add_time
appserver.sources.spool_WebLogs.interceptors.add_time.type = regex_extractor
appserver.sources.spool_WebLogs.interceptors.add_time.regex = \\t(\\d{4}-\\d{2}-\\d{2}.\\d{2}:\\d{2})
appserver.sources.spool_WebLogs.interceptors.add_time.serializers = millis
appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.name = timestamp
appserver.sources.spool_WebLogs.interceptors.add_time.serializers.millis.pattern = yyyy-MM-dd HH:mm

Hope that helps,
Paul Chavez
From: Siddharth Tiwari [mailto:[EMAIL PROTECTED]]
Sent: Thursday, October 31, 2013 7:05 PM
To: [EMAIL PROTECTED]
Subject: RE: Flume not moving data to HDFS or local

Can you describe the process to setup spooling directory source ? I am sorry I do not know how to to do that. If you can give me a step by step description on how to configure that and the configuration changes I need to make in my conf to get it done I will be really thankful .. Appreciate your help :)
*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God."
"Maybe other people will try to limit me but I don't limit myself"

________________________________
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Date: Thu, 31 Oct 2013 14:38:54 -0700
Subject: RE: Flume not moving data to HDFS or local
It should commit when one of the various file roll configuration values are hit. There's a list of them and their defaults in the flume user guide.

For managing new files on your app servers, the best option right now seems to be a spooling directory source along with some kind of cron jobs that run locally on the app servers to drop files in the spool directory when ready. In my case I run a job that executes a custom script to checkpoint a file that is appended to all day long, creating incremental files every minute to drop in the spool directory.
From: Siddharth Tiwari [mailto:[EMAIL PROTECTED]]
Sent: Thursday, October 31, 2013 12:47 PM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: RE: Flume not moving data to HDFS or local
It got resolved it was due to wrong version of guava jar file in flume lib, but still I can see a .tmp extention in teh fiel in HDFS, when does it actually gets commited ? :) ... One another question though What should I change in my configuration file to capture new files being generated in a directory in remote m,achine ?
Say for example there is one new file generated every hour in my webserver hostlog directory. What do I change in my configuration so that I get teh new file directly in my HDFS compressed ?

*------------------------*
Cheers !!!
Siddharth Tiwari
Have a refreshing day !!!
"Every duty is holy, and devotion to duty is the highest form of worship of God."
"Maybe other people will try to limit me but I don't limit myself"
________________________________
From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: RE: Flume not moving data to HDFS or local
Date: Thu, 31 Oct 2013 19:29:36 +0000
Hi Paul

I see following error :-

13/10/31 12:27:01 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build()Lcom/google/common/cache/Cache;
          at org.apache.hadoop.hdfs.DomainSocketFactory.<init>(DomainSocketFactory.java:45)
          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:490)
          at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:445)
          at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
          at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2429)
          at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
          at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2463)
          at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2445)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:363)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:165)
          at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:347)
          at org.apache.hadoop.fs.Path.getFileSystem(Path.java:275)
          at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:186)
          at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)
          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)
          at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)
          at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)
          at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)
          at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)
          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)
          at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)
          at java.util.concurrent.FutureTask.run(FutureTask.java:262)
          at java.util.concurrent
+
Siddharth Tiwari 2013-11-01, 06:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB