Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - Block Under-replication detected. Rotating file.


+
Andrei 2013-08-20, 10:11
+
Mike Percy 2013-08-22, 10:13
Copy link to this message
-
Re: Block Under-replication detected. Rotating file.
Andrei 2013-08-22, 10:47
Hi Mike,

it makes sense - replication factor is really less then recommended: we
test Hadoop on 2 large machines and thus replication is set to 1, but HDFS
seems to ignore config and still replicate block 3 times. I was confused
about generating small files *before* normal large files, but if Flume has
some counter for replication attempts, that explains it.

Thanks.

On Thu, Aug 22, 2013 at 1:13 PM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Are you sure your HDFS cluster is configured properly? How big is the
> cluster?
>
> It's complaining that your HDFS blocks are not replicated enough based on
> your configured replication factor, and tries to get a sufficiently
> replicated pipeline by closing the current file and opening a new one to
> write to. Finally it gives up.
>
> That code is still there on trunk...
>
> Mike
>
> Sent from my iPhone
>
> On Aug 20, 2013, at 3:11 AM, Andrei <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I have Flume agent with spool directory as source and HDFS sink. I have
> configured sink to roll files only when they reach some (quite large) size
> (see full config below). However, when I *restart* Flume, it first
> generates ~15 small files (~500 bytes) and only after that starts writing
> large file. In Flume logs at the time of generating small files I see
> message "Block Under-replication detected. Rotating file".
>
> From source code I've figured out several things:
>
> 1. This message is specific to Flume 1.3 and doesn't exist in latest
> version.
> 2. It comes from BlockWriter.shouldRotate() methid which in its turn calls
> HDFSWriter.isUnderReplicated(), and if it returns true, above message is
> generated and files is rotated.
>
> My questions are: why it happens and how do I fix it?
>
>
> Flume 1.3 CDH 4.3
>
> flume.config
> -----------------
>
> agent.sources = my-src
> agent.channels = my-ch
> agent.sinks = my-sink
>
> agent.sources.my-src.type = spooldir
> agent.sources.my-src.spoolDir = /flume/data
> agent.sources.my-src.channels = my-ch
> agent.sources.my-src.deletePolicy = immediate
> agent.sources.my-src.interceptors = tstamp-int
> agent.sources.my-src.interceptors.tstamp-int.type = timestamp
>
> agent.channels.my-ch.type = file
> agent.channels.my-ch.checkpointDir = /flume/checkpoint
> agent.channels.my-ch.dataDirs = /flume/channel-data
>
> agent.sinks.my-sink.type = hdfs
> agent.sinks.my-sink.hdfs.path = hdfs://my-hdfs:8020/logs
> agent.sinks.my-sink.hdfs.filePrefix = Log
> agent.sinks.my-sink.hdfs.batchSize = 10
> agent.sinks.my-sink.hdfs.rollInterval = 3600
> agent.sinks.my-sink.hdfs.rollCount = 0
> agent.sinks.my-sink.hdfs.rollSize = 134217728
> agent.sinks.my-sink.hdfs.fileType = DataStream
> agent.sinks.my-sink.channel = my-ch
>
>
+
Mike Percy 2013-08-22, 17:32