Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - HDFS Sink keeps .tmp files and closes with exception


Copy link to this message
-
Re: HDFS Sink keeps .tmp files and closes with exception
Bhaskar V. Karambelkar 2012-10-18, 22:39
Use CDH3u5 hadoop distribution.
1.x don't handle client JVM shutdown's correctly.

Search the mail threads with my name, you'll find more answers.
But you've no choice but to use Cloudera's CDH3u5.

thanks

On Thu, Oct 18, 2012 at 4:18 PM, Nishant Neeraj
<[EMAIL PROTECTED]> wrote:
> I am working on a POC using
>
> flume-ng version Flume 1.2.0-cdh4.1.1
> Hadoop 1.0.4
>
> The config looks like this
>
> #Flume agent configuration
> agent1.sources = avroSource1
> agent1.sinks = fileSink1
> agent1.channels = memChannel1
>
> agent1.sources.avroSource1.type = avro
> agent1.sources.avroSource1.channels = memChannel1
> agent1.sources.avroSource1.bind = 0.0.0.0
> agent1.sources.avroSource1.port = 4545
>
> agent1.sources.avroSource1.interceptors = b
> agent1.sources.avroSource1.interceptors.b.type > org.apache.flume.interceptor.TimestampInterceptor$Builder
>
> agent1.sinks.fileSink1.type = hdfs
> agent1.sinks.fileSink1.channel = memChannel1
> agent1.sinks.fileSink1.hdfs.path = /flume/agg1/%y-%m-%d
> agent1.sinks.fileSink1.hdfs.filePrefix = agg
> agent1.sinks.fileSink1.hdfs.rollInterval = 0
> agent1.sinks.fileSink1.hdfs.rollSize = 0
> agent1.sinks.fileSink1.hdfs.rollCount = 0
> agent1.sinks.fileSink1.hdfs.fileType = DataStream
> agent1.sinks.fileSink1.hdfs.writeFormat = Text
>
>
> agent1.channels.memChannel1.type = memory
> agent1.channels.memChannel1.capacity = 1000
> agent1.channels.memChannel1.transactionCapacity = 1000
>
>
> Basically, I do not want to roll the file at all. I am just wanting to tail
> and watch the show from Hadoop UI. The problem is it does not work. The
> console keeps saying,
>
> agg.1350590350462.tmp 0 KB    2012-10-18 19:59
>
> Flume console shows events getting pushes. When I stop the flume,  I see the
> file gets populated, but the '.tmp' is still in the file name. And I see
> this exception on close.
>
> 2012-10-18 20:06:49,315 (hdfs-fileSink1-call-runner-8) [DEBUG -
> org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:254)]
> Closing /flume/agg1/12-10-18/agg.1350590350462.tmp
> 2012-10-18 20:06:49,316 (hdfs-fileSink1-call-runner-8) [WARN -
> org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:260)]
> failed to close() HDFSWriter for file
> (/flume/agg1/12-10-18/agg.1350590350462.tmp). Exception follows.
> java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
> at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3667)
> at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
> at org.apache.flume.sink.hdfs.HDFSDataStream.close(HDFSDataStream.java:103)
> at org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:257)
> at org.apache.flume.sink.hdfs.BucketWriter.access$400(BucketWriter.java:50)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:243)
> at org.apache.flume.sink.hdfs.BucketWriter$3.run(BucketWriter.java:240)
> at
> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127)
> at org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:240)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:748)
> at org.apache.flume.sink.hdfs.HDFSEventSink$3.call(HDFSEventSink.java:745)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:679)
>
>
> Thanks
> Nishant