Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing


+
Justin Workman 2012-10-12, 22:51
Copy link to this message
-
Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing
This patch has serious technical flaws. If you want this functionality then
you just need to set hdfs.maxOpenFiles = 1

However for typical use I would strongly recommend setting rollInterval 300 and let it roll every 5 minutes.

Regards,
Mike

On Fri, Oct 12, 2012 at 3:51 PM, Justin Workman <[EMAIL PROTECTED]>wrote:

> I can confirm that we are seeing this issue as well. We are only using
> rollSize and when time stamp indicated its time to create a new date
> bucket. The path and new file are created however the existing file is
> never closed and renamed.
>
> Applying this patch resolved the issue we were seeing and existing
> files are closed now when the new one is opened.
>
>
>
> Sent from my iPhone
>
> On Oct 12, 2012, at 4:41 PM, "Mike Percy (JIRA)" <[EMAIL PROTECTED]> wrote:
>
> >
> >    [
> https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475413#comment-13475413]
> >
> > Mike Percy commented on FLUME-1350:
> > -----------------------------------
> >
> > That path means that any Event that goes to the HDFS sink must have a
> header called "timestamp" which is a stringified Long value, typical Java
> timestamp in milliseconds. The year-month-day will be generated from that
> timestamp, and the event will be stored in a file under that directory.
> >
> > If there is already an open file in that directory, the event will be
> appended to that file. If there is no open file in that directory, a new
> file will be created.
> >
> > The only rules for closing a file are listed above, because when events
> are collected from many hosts, there may be old events coming through at
> the same time as new events, and we would not want to create too many small
> files. So, the time to allow a file to remain open is configurable before
> automatically closing it using rollInterval.
> >
> >> HDFS file handle not closed properly when date bucketing
> >> ---------------------------------------------------------
> >>
> >>                Key: FLUME-1350
> >>                URL: https://issues.apache.org/jira/browse/FLUME-1350
> >>            Project: Flume
> >>         Issue Type: Bug
> >>         Components: Sinks+Sources
> >>   Affects Versions: v1.1.0, v1.2.0
> >>           Reporter: Robert Mroczkowski
> >>        Attachments: HDFSEventSink.java.patch
> >>
> >>
> >> With configuration:
> >> agent.sinks.hdfs-cafe-access.type = hdfs
> >> agent.sinks.hdfs-cafe-access.hdfs.path >  hdfs://nga/nga/apache/access/%y-%m-%d/
> >> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> >> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> >> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> >> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> >> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> >> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> >> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> >> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> >> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> >> agent.sinks.hdfs-cafe-access.channel = memo-1
> >> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
>
+
Roshan Naik 2012-10-18, 20:13
+
Juhani Connolly 2012-10-19, 10:00