-Re: Writing reliably to HDFS
Juhani Connolly 2012-08-02, 03:45
On 08/02/2012 11:07 AM, バーチャル クリストファー wrote:
> I'm trying to write events to HDFS using Flume 1.2.0 and I have a couple
> of questions.
> Firstly, about the reliability semantics of the HdfsEventSink.
> My number one requirement is reliability, i.e. not losing any events.
> Ideally, by the time the HdfsEventSink commits the transaction, all
> events should be safely written to HDFS and visible to other clients, so
> that no data is lost even if the agent dies after that point. But what
> is actually happening in my tests is as follows:
> 1. The HDFS sink takes some events from the FileChannel and writes them
> to a SequenceFile on HDFS
> 2. The sink commits the transaction, and the FileChannel updates its
> checkpoint. As far as FileChannel is concerned, the events have been
> safely written to the sink.
> 3. Kill the agent.
> Result: I'm left with a weird zero-byte, non-zero-byte tmp file on HDFS.
> The SequenceFile has not yet been closed and rolled over, so it is still
> a ".tmp" file. The data is actually in the HDFS blocks, but because the
> file was not closed, the NameNode thinks it has a length of 0 bytes. I'm
> not sure how to recover from this.
> Is this the expected behaviour of the HDFS sink, or am I doing something
> wrong? Do I need to explicitly enable HDFS append? (I am using HDFS
> I guess the problem is that data is not "safely" written until file
> rollover occurs, but the timing of file rollover (by time, log count,
> file size, etc.) is unrelated to the timing of transactions. Is there
> any way to put these in sync with each other?
Regarding reliability, I believe that while the file may not be closed,
you're not actually at risk of losing data. I suspect that adding in
some code to the sink shutdown to close up any temp files may be a good
idea. To deal with unexpected failure it may even be an idea to try
scanning the dest path for any unclosed files on startup.
I'm not really too familiar with the workings of hdfs sink so maybe
someone else can add more detail. In our test setup we have yet to have
any data loss from it.
> Second question: Could somebody please explain the reasoning behind the
> default values of the HDFS sink configuration? If I use the defaults,
> the sink generates zillions of tiny files (max 10 events per file),
> which as I understand it is not a recommended way to use HDFS.
> Is it OK to change these settings to generate much larger files (MB, GB
> scale)? Or should I write a script that periodically combines these tiny
> files into larger ones?
> Thanks for any advice,
> Chris Birchall.
There's no harm in changing those defaults and I'd strongly recommend
doing so. We have most of the rolls switched off(set to 0) and we just
roll hourly(because that's how we want to separate our logs). You may
also want to change the hdfs.batchSize which defaults to 1... Which is
gong to cause a bottleneck if you have even a moderate amount of
traffic. One thing to note is that with large batches, it's possible for
events to be duplicated(if the batch got partially written and then had
an error, it will get rollbacked at the channel and then rewritten).