Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS Sink stops writing events because HDFSWriter failed to append and close a file


Copy link to this message
-
Re: HDFS Sink stops writing events because HDFSWriter failed to append and close a file
Hi Jeff,

We are using flume-ng version 1.3.

-Ashish
On Wednesday 29 May 2013 03:08 AM, Jeff Lord wrote:
> Hi Ashish,
>
> What version of flume are you running?
>
> flume-ng version
>
> -Jeff
>
>
> On Fri, May 24, 2013 at 3:38 AM, Ashish Tadose
> <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>     Hi All,
>
>     We are facing this issue in production flume setup.
>
>     Issue initiates when HDFS sink BucketWriter fails to append a
>     batch for a file because of hadoop datanode issue.
>     Then it tries to close that file but even close() file throws a
>     exception and HDFS sink does not remove that bucketwriter
>     instance. [Cause may be no. of bucketwriter instances are below
>     *maxOpenFiles*].
>
>     Whats creating more problem is that HDFS sink is not letting go
>     off that bucketwriter instance and keeps trying to append/close to
>     that file infinitely which cause it to disrupt the event
>     processing towards to HDFS.
>
>     *Logs from flume agent **
>     *
>     2013-05-20 02:32:40,669 (ResponseProcessor for block
>     blk_8857674139042547711_2143611) [WARN -
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3015)]
>     DFSOutputStream ResponseProcessor exception  for block
>     blk_8857674139042547711_2143611java.net.SocketTimeoutException:
>     69000 millis timeout while waiting for channel to be ready for
>     read. ch : java.nio.channels.SocketChannel[connected
>     local=/<datanode_ip>:41129 remote=/<datanode2_ip>:60010]
>             at
>     org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>             at
>     org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>             at
>     org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>             at java.io.DataInputStream.readFully(DataInputStream.java:178)
>             at java.io.DataInputStream.readLong(DataInputStream.java:399)
>             at
>     org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
>             at
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967)
>
>     2013-05-20 02:32:56,321 (DataStreamer for file
>     /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
>     blk_8857674139042547711_2143611) [WARN -
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3051)]
>     *Error Recovery for block blk_8857674139042547711_2143611 bad
>     datanode[0] <datanode2_ip>:60010*
>     2013-05-20 02:32:56,322 (DataStreamer for file
>     /flume/data/Flume_raw_1_.1368960029025.lzo.tmp block
>     blk_8857674139042547711_2143611) [WARN -
>     org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3102)]
>     Error Recovery for block blk_8857674139042547711_2143611 in
>     pipeline <datanode2_ip>:60010, <datanode3_ip>:60010,
>     <datanode4_ip>:60010: bad datanode <datanode2_ip>:60010
>     2013-05-20 02:32:56,538 (Log-BackgroundWorker-fileChannel) [INFO -
>     org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:109)]
>     Start checkpoint for
>     /home/flume/flume_channel/checkpointDir15/checkpoint, elements to
>     sync = 42000
>     2013-05-20 02:32:56,634 (Log-BackgroundWorker-fileChannel) [INFO -
>     org.apache.flume.channel.file.EventQueueBackingStoreFile.checkpoint(EventQueueBackingStoreFile.java:117)]
>     Updating checkpoint metadata: logWriteOrderID: 1370969225649,
>     queueSize: 140156, queueHead: 72872626
>     2013-05-20 02:32:56,722 (Log-BackgroundWorker-fileChannel) [INFO -
>     org.apache.flume.channel.file.LogFileV3$MetaDataWriter.markCheckpoint(LogFileV3.java:85)]
>     Updating log-543.meta currentPosition = 743847065, logWriteOrderID
>     = 1370969225649
>     2013-05-20 02:32:56,759 (Log-BackgroundWorker-fileChannel) [INFO -
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB