Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS Sink stops writing events because HDFSWriter failed to append and close a file


+
Ashish Tadose 2013-05-24, 10:38
+
Jeff Lord 2013-05-28, 21:38
Copy link to this message
-
Re: HDFS Sink stops writing events because HDFSWriter failed to append and close a file
Hi Jeff,

We are using flume-ng version 1.3.

-Ashish
On Wednesday 29 May 2013 03:08 AM, Jeff Lord wrote:
> Hi Ashish,
>
> What version of flume are you running?
>
> flume-ng version
>
> -Jeff
>
>
> On Fri, May 24, 2013 at 3:38 AM, Ashish Tadose
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hi All,
>
>     We are facing this issue in production flume setup.
>
>     Issue initiates when HDFS sink BucketWriter fails to append a
>     batch for a file because of hadoop datanode issue.
>     Then it tries to close that file but even close() file throws a
>     exception and HDFS sink does not remove that bucketwriter
>     instance. [Cause may be no. of bucketwriter instances are below
>     *maxOpenFiles*].
>
>     Whats creating more problem is that HDFS sink is not letting go
>     off that bucketwriter instance and keeps trying to append/close to
>     that file infinitely which cause it to disrupt the event
>     processing towards to HDFS.
>
>     *Logs from flume agent **
>     *
>     2013-05-20 02:32:40,669 (ResponseProcessor for block
>     blk_8857674139042547711_2143611) [WARN -
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3015)]
>     DFSOutputStream ResponseProcessor exception  for block
>     blk_8857674139042547711_2143611java.net.SocketTimeoutException:
>     69000 millis timeout while waiting for channel to be ready for
>     read. ch : java.nio.channels.SocketChannel[connected
>     local=/<datanode_ip>:41129 remote=/<datanode2_ip>:60010]
>             at
>     org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>             at
>     org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>             at
>     org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>             at java.io.DataInputStream.readFully(DataInputStream.java:178)
>             at java.io.DataInputStream.readLong(DataInputStream.java:399)
>             at
>     org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
>             at
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967)
>
>     2013-05-20 02:32:56,321 (DataStreamer for file
>     /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
>     blk_8857674139042547711_2143611) [WARN -
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3051)]
>     *Error Recovery for block blk_8857674139042547711_2143611 bad
>     datanode[0] <datanode2_ip>:60010*
>     2013-05-20 02:32:56,322 (DataStreamer for file
>     /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
>     blk_8857674139042547711_2143611) [WARN -
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3102)]
>     Error Recovery for block blk_8857674139042547711_2143611 in
>     pipeline <datanode2_ip>:60010, <datanode3_ip>:60010,
>     <datanode4_ip>:60010: bad datanode <datanode2_ip>:60010
>     2013-05-20 02:32:56,538 (Log-BackgroundWorker-fileChannel) [INFO -
>     org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:109)]
>     Start checkpoint for
>     /home/flume/flume_channel/checkpointDir15/checkpoint, elements to
>     sync = 42000
>     2013-05-20 02:32:56,634 (Log-BackgroundWorker-fileChannel) [INFO -
>     org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:117)]
>     Updating checkpoint metadata: logWriteOrderID: 1370969225649,
>     queueSize: 140156, queueHead: 72872626
>     2013-05-20 02:32:56,722 (Log-BackgroundWorker-fileChannel) [INFO -
>     org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint(LogFileV3.java:85)]
>     Updating log-543.meta currentPosition = 743847065, logWriteOrderID
>     = 1370969225649
>     2013-05-20 02:32:56,759 (Log-BackgroundWorker-fileChannel) [INFO -
+
Ron van der Vegt 2014-01-06, 11:54