Are you sure your HDFS cluster is configured properly? How big is the cluster?
It's complaining that your HDFS blocks are not replicated enough based on your configured replication factor, and tries to get a sufficiently replicated pipeline by closing the current file and opening a new one to write to. Finally it gives up.
That code is still there on trunk...
Sent from my iPhone
On Aug 20, 2013, at 3:11 AM, Andrei <[EMAIL PROTECTED]> wrote:
> I have Flume agent with spool directory as source and HDFS sink. I have configured sink to roll files only when they reach some (quite large) size (see full config below). However, when I restart Flume, it first generates ~15 small files (~500 bytes) and only after that starts writing large file. In Flume logs at the time of generating small files I see message "Block Under-replication detected. Rotating file".
> From source code I've figured out several things:
> 1. This message is specific to Flume 1.3 and doesn't exist in latest version.
> 2. It comes from BlockWriter.shouldRotate() methid which in its turn calls HDFSWriter.isUnderReplicated(), and if it returns true, above message is generated and files is rotated.
> My questions are: why it happens and how do I fix it?
> Flume 1.3 CDH 4.3
> agent.sources = my-src
> agent.channels = my-ch
> agent.sinks = my-sink
> agent.sources.my-src.type = spooldir
> agent.sources.my-src.spoolDir = /flume/data
> agent.sources.my-src.channels = my-ch
> agent.sources.my-src.deletePolicy = immediate
> agent.sources.my-src.interceptors = tstamp-int
> agent.sources.my-src.interceptors.tstamp-int.type = timestamp
> agent.channels.my-ch.type = file
> agent.channels.my-ch.checkpointDir = /flume/checkpoint
> agent.channels.my-ch.dataDirs = /flume/channel-data
> agent.sinks.my-sink.type = hdfs
> agent.sinks.my-sink.hdfs.path = hdfs://my-hdfs:8020/logs
> agent.sinks.my-sink.hdfs.filePrefix = Log
> agent.sinks.my-sink.hdfs.batchSize = 10
> agent.sinks.my-sink.hdfs.rollInterval = 3600
> agent.sinks.my-sink.hdfs.rollCount = 0
> agent.sinks.my-sink.hdfs.rollSize = 134217728
> agent.sinks.my-sink.hdfs.fileType = DataStream
> agent.sinks.my-sink.channel = my-ch