-Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing
Mike Percy 2012-10-13, 00:07
This patch has serious technical flaws. If you want this functionality then
you just need to set hdfs.maxOpenFiles = 1
However for typical use I would strongly recommend setting rollInterval 300 and let it roll every 5 minutes.
On Fri, Oct 12, 2012 at 3:51 PM, Justin Workman <[EMAIL PROTECTED]>wrote:
> I can confirm that we are seeing this issue as well. We are only using
> rollSize and when time stamp indicated its time to create a new date
> bucket. The path and new file are created however the existing file is
> never closed and renamed.
> Applying this patch resolved the issue we were seeing and existing
> files are closed now when the new one is opened.
> Sent from my iPhone
> On Oct 12, 2012, at 4:41 PM, "Mike Percy (JIRA)" <[EMAIL PROTECTED]> wrote:
> > [
> > Mike Percy commented on FLUME-1350:
> > -----------------------------------
> > That path means that any Event that goes to the HDFS sink must have a
> header called "timestamp" which is a stringified Long value, typical Java
> timestamp in milliseconds. The year-month-day will be generated from that
> timestamp, and the event will be stored in a file under that directory.
> > If there is already an open file in that directory, the event will be
> appended to that file. If there is no open file in that directory, a new
> file will be created.
> > The only rules for closing a file are listed above, because when events
> are collected from many hosts, there may be old events coming through at
> the same time as new events, and we would not want to create too many small
> files. So, the time to allow a file to remain open is configurable before
> automatically closing it using rollInterval.
> >> HDFS file handle not closed properly when date bucketing
> >> ---------------------------------------------------------
> >> Key: FLUME-1350
> >> URL: https://issues.apache.org/jira/browse/FLUME-1350
> >> Project: Flume
> >> Issue Type: Bug
> >> Components: Sinks+Sources
> >> Affects Versions: v1.1.0, v1.2.0
> >> Reporter: Robert Mroczkowski
> >> Attachments: HDFSEventSink.java.patch
> >> With configuration:
> >> agent.sinks.hdfs-cafe-access.type = hdfs
> >> agent.sinks.hdfs-cafe-access.hdfs.path > hdfs://nga/nga/apache/access/%y-%m-%d/
> >> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> >> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> >> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> >> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> >> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> >> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> >> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> >> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> >> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> >> agent.sinks.hdfs-cafe-access.channel = memo-1
> >> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > For more information on JIRA, see: