Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS Sink stops writing events because HDFSWriter failed to append and close a file


Copy link to this message
-
Re: HDFS Sink stops writing events because HDFSWriter failed to append and close a file
Hi Ashish,

What version of flume are you running?

flume-ng version

-Jeff
On Fri, May 24, 2013 at 3:38 AM, Ashish Tadose
<[EMAIL PROTECTED]>wrote:

>  Hi All,
>
> We are facing this issue in production flume setup.
>
> Issue initiates when HDFS sink BucketWriter fails to append a batch for a
> file because of hadoop datanode issue.
> Then it tries to close that file but even close() file throws a exception
> and HDFS sink does not remove that bucketwriter instance. [Cause may be no.
> of bucketwriter instances are below *maxOpenFiles*].
>
> Whats creating more problem is that HDFS sink is not letting go off that
> bucketwriter instance and keeps trying to append/close to that file
> infinitely which cause it to disrupt the event processing towards to HDFS.
>
> *Logs from flume agent **
> *
> 2013-05-20 02:32:40,669 (ResponseProcessor for block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3015)]
> DFSOutputStream ResponseProcessor exception  for block
> blk_8857674139042547711_2143611java.net.SocketTimeoutException: 69000
> millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/<datanode_ip>:41129
> remote=/<datanode2_ip>:60010]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967)
>
> 2013-05-20 02:32:56,321 (DataStreamer for file
> /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3051)]
> *Error Recovery for block blk_8857674139042547711_2143611 bad datanode[0]
> <datanode2_ip>:60010*
> 2013-05-20 02:32:56,322 (DataStreamer for file
> /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3102)]
> Error Recovery for block blk_8857674139042547711_2143611 in pipeline
> <datanode2_ip>:60010, <datanode3_ip>:60010, <datanode4_ip>:60010: bad
> datanode <datanode2_ip>:60010
> 2013-05-20 02:32:56,538 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:109)]
> Start checkpoint for /home/flume/flume_channel/checkpointDir15/checkpoint,
> elements to sync = 42000
> 2013-05-20 02:32:56,634 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:117)]
> Updating checkpoint metadata: logWriteOrderID: 1370969225649, queueSize:
> 140156, queueHead: 72872626
> 2013-05-20 02:32:56,722 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint(LogFileV3.java:85)]
> Updating log-543.meta currentPosition = 743847065, logWriteOrderID > 1370969225649
> 2013-05-20 02:32:56,759 (Log-BackgroundWorker-fileChannel) [INFO -
> org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:886)] Updated
> checkpoint for file: /home/flume/flume_channel/dataDir15/log-543 position:
> 743847065 logWriteOrderID: 1370969225649
> 2013-05-20 02:33:56,331 (DataStreamer for file
> /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
> blk_8857674139042547711_2143611) [WARN -
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3123)]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB