Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing


Copy link to this message
-
Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing
This patch has serious technical flaws. If you want this functionality then
you just need to set hdfs.maxOpenFiles = 1

However for typical use I would strongly recommend setting rollInterval 300 and let it roll every 5 minutes.

Regards,
Mike

On Fri, Oct 12, 2012 at 3:51 PM, Justin Workman <[EMAIL PROTECTED]>wrote:

> I can confirm that we are seeing this issue as well. We are only using
> rollSize and when time stamp indicated its time to create a new date
> bucket. The path and new file are created however the existing file is
> never closed and renamed.
>
> Applying this patch resolved the issue we were seeing and existing
> files are closed now when the new one is opened.
>
>
>
> Sent from my iPhone
>
> On Oct 12, 2012, at 4:41 PM, "Mike Percy (JIRA)" <[EMAIL PROTECTED]> wrote:
>
> >
> >    [
> https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475413#comment-13475413]
> >
> > Mike Percy commented on FLUME-1350:
> > -----------------------------------
> >
> > That path means that any Event that goes to the HDFS sink must have a
> header called "timestamp" which is a stringified Long value, typical Java
> timestamp in milliseconds. The year-month-day will be generated from that
> timestamp, and the event will be stored in a file under that directory.
> >
> > If there is already an open file in that directory, the event will be
> appended to that file. If there is no open file in that directory, a new
> file will be created.
> >
> > The only rules for closing a file are listed above, because when events
> are collected from many hosts, there may be old events coming through at
> the same time as new events, and we would not want to create too many small
> files. So, the time to allow a file to remain open is configurable before
> automatically closing it using rollInterval.
> >
> >> HDFS file handle not closed properly when date bucketing
> >> ---------------------------------------------------------
> >>
> >>                Key: FLUME-1350
> >>                URL: https://issues.apache.org/jira/browse/FLUME-1350
> >>            Project: Flume
> >>         Issue Type: Bug
> >>         Components: Sinks+Sources
> >>   Affects Versions: v1.1.0, v1.2.0
> >>           Reporter: Robert Mroczkowski
> >>        Attachments: HDFSEventSink.java.patch
> >>
> >>
> >> With configuration:
> >> agent.sinks.hdfs-cafe-access.type = hdfs
> >> agent.sinks.hdfs-cafe-access.hdfs.path >  hdfs://nga/nga/apache/access/%y-%m-%d/
> >> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
> >> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
> >> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
> >> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
> >> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
> >> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
> >> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
> >> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
> >> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
> >> agent.sinks.hdfs-cafe-access.channel = memo-1
> >> When new directory is created previous file handle remains opened.
> rollInterval setting is used only with files in current date bucket.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB