it makes sense - replication factor is really less then recommended: we
test Hadoop on 2 large machines and thus replication is set to 1, but HDFS
seems to ignore config and still replicate block 3 times. I was confused
about generating small files *before* normal large files, but if Flume has
some counter for replication attempts, that explains it.
On Thu, Aug 22, 2013 at 1:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
> Are you sure your HDFS cluster is configured properly? How big is the
> It's complaining that your HDFS blocks are not replicated enough based on
> your configured replication factor, and tries to get a sufficiently
> replicated pipeline by closing the current file and opening a new one to
> write to. Finally it gives up.
> That code is still there on trunk...
> Sent from my iPhone
> On Aug 20, 2013, at 3:11 AM, Andrei <[EMAIL PROTECTED]> wrote:
> I have Flume agent with spool directory as source and HDFS sink. I have
> configured sink to roll files only when they reach some (quite large) size
> (see full config below). However, when I *restart* Flume, it first
> generates ~15 small files (~500 bytes) and only after that starts writing
> large file. In Flume logs at the time of generating small files I see
> message "Block Under-replication detected. Rotating file".
> From source code I've figured out several things:
> 1. This message is specific to Flume 1.3 and doesn't exist in latest
> 2. It comes from BlockWriter.shouldRotate() methid which in its turn calls
> HDFSWriter.isUnderReplicated(), and if it returns true, above message is
> generated and files is rotated.
> My questions are: why it happens and how do I fix it?
> Flume 1.3 CDH 4.3
> agent.sources = my-src
> agent.channels = my-ch
> agent.sinks = my-sink
> agent.sources.my-src.type = spooldir
> agent.sources.my-src.spoolDir = /flume/data
> agent.sources.my-src.channels = my-ch
> agent.sources.my-src.deletePolicy = immediate
> agent.sources.my-src.interceptors = tstamp-int
> agent.sources.my-src.interceptors.tstamp-int.type = timestamp
> agent.channels.my-ch.type = file
> agent.channels.my-ch.checkpointDir = /flume/checkpoint
> agent.channels.my-ch.dataDirs = /flume/channel-data
> agent.sinks.my-sink.type = hdfs
> agent.sinks.my-sink.hdfs.path = hdfs://my-hdfs:8020/logs
> agent.sinks.my-sink.hdfs.filePrefix = Log
> agent.sinks.my-sink.hdfs.batchSize = 10
> agent.sinks.my-sink.hdfs.rollInterval = 3600
> agent.sinks.my-sink.hdfs.rollCount = 0
> agent.sinks.my-sink.hdfs.rollSize = 134217728
> agent.sinks.my-sink.hdfs.fileType = DataStream
> agent.sinks.my-sink.channel = my-ch