Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS sink data loss possible ?

Copy link to this message
HDFS sink data loss possible ?

Based on our observations on our production setup in flume:

We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
=======Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which
deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (though
not very huge
1 million in 1 billion events).

Is it possible in some scenario?  (Just to add datanodes of the hadoop
cluster are highly loaded. Can that lead to any disaster?)