Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS sink data loss possible ?


Copy link to this message
-
HDFS sink data loss possible ?
Hi

Based on our observations on our production setup in flume:

We have seen file roll sink delivering almost 1% events greater than those
delivered by HDFS sink per day.
(We have replicating setup and two different
file channels for the sinks).

Configuration :
=======Flume version:1.3.1
Flume topology: 30 first tier machines and 3 second tier machines (which
deliver to HDFS and local file system)
HDFS compression codec :lzop
Channels : File channel for every source-sink pair.
Hadoop version :1.0.3 (Apache Hadoop)

Things are working fine but we see some data loss in the HDFS (though
not very huge
1 million in 1 billion events).

Is it possible in some scenario?  (Just to add datanodes of the hadoop
cluster are highly loaded. Can that lead to any disaster?)

Regards,
Jagadish
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB