Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS Sink stops writing events because HDFSWriter failed to append and close a file


+
Ashish Tadose 2013-05-24, 10:38
Copy link to this message
-
Re: HDFS Sink stops writing events because HDFSWriter failed to append and close a file
Hi Ashish,

What version of flume are you running?

flume-ng version

-Jeff
On Fri, May 24, 2013 at 3:38 AM, Ashish Tadose
<[EMAIL PROTECTED]>wrote:

>  Hi All,
>
> We are facing this issue in production flume setup.
>
> Issue initiates when HDFS sink BucketWriter fails to append a batch for a
> file because of hadoop datanode issue.
> Then it tries to close that file but even close() file throws a exception
> and HDFS sink does not remove that bucketwriter instance. [Cause may be no.
> of bucketwriter instances are below *maxOpenFiles*].
>
> Whats creating more problem is that HDFS sink is not letting go off that
> bucketwriter instance and keeps trying to append/close to that file
> infinitely which cause it to disrupt the event processing towards to HDFS.
>
> *Logs from flume agent **
> *
> 2013-05-20 02:32:40,669 (ResponseProcessor for block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3015)]
> DFSOutputStream ResponseProcessor exception  for block
> blk_8857674139042547711_2143611java.net.SocketTimeoutException: 69000
> millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/<datanode_ip>:41129
> remote=/<datanode2_ip>:60010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967)
>
> 2013-05-20 02:32:56,321 (DataStreamer for file
> /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3051)]
> *Error Recovery for block blk_8857674139042547711_2143611 bad datanode[0]
> <datanode2_ip>:60010*
> 2013-05-20 02:32:56,322 (DataStreamer for file
> /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3102)]
> Error Recovery for block blk_8857674139042547711_2143611 in pipeline
> <datanode2_ip>:60010, <datanode3_ip>:60010, <datanode4_ip>:60010: bad
> datanode <datanode2_ip>:60010
> 2013-05-20 02:32:56,538 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:109)]
> Start checkpoint for /home/flume/flume_channel/checkpointDir15/checkpoint,
> elements to sync = 42000
> 2013-05-20 02:32:56,634 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:117)]
> Updating checkpoint metadata: logWriteOrderID: 1370969225649, queueSize:
> 140156, queueHead: 72872626
> 2013-05-20 02:32:56,722 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint(LogFileV3.java:85)]
> Updating log-543.meta currentPosition = 743847065, logWriteOrderID > 1370969225649
> 2013-05-20 02:32:56,759 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:886)] Updated
> checkpoint for file: /home/flume/flume_channel/dataDir15/log-543 position:
> 743847065 logWriteOrderID: 1370969225649
> 2013-05-20 02:33:56,331 (DataStreamer for file
> /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3123)]
+
Ashish Tadose 2013-05-29, 06:35
+
Ron van der Vegt 2014-01-06, 11:54