Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing


Copy link to this message
-
Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing
My implementation is synchronized on the writer map, and the append and
close operations on the bucketwriter are synchronized. It is possible
for a writer to rarely get closed before it's about to append but  that
is harmless as it will just back off and get a fresh writer the next
cycle. Also, if possible, please add comments to the jira thread when
the mail is generated from there :)

On 10/19/2012 05:13 AM, Roshan Naik wrote:
> Will need to handle race conditions like..  a thread resumes writing
> immediately after the watcher thread decides to close the file handle. In
> that sense a deterministic close is nicer than a timeout based 'garbage
> collection'
> -roshan
>
>
> On Thu, Oct 18, 2012 at 12:04 PM, Mike Percy (JIRA) <[EMAIL PROTECTED]> wrote:
>
>>      [
>> https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479255#comment-13479255]
>>
>> Mike Percy commented on FLUME-1350:
>> -----------------------------------
>>
>> Hi Juhani, something like a close-on-idle timeout makes sense. I'd be
>> happy to review it if you want to work on it.
>>
>>> HDFS file handle not closed properly when date bucketing
>>> ---------------------------------------------------------
>>>
>>>                  Key: FLUME-1350
>>>                  URL: https://issues.apache.org/jira/browse/FLUME-1350
>>>              Project: Flume
>>>           Issue Type: Bug
>>>           Components: Sinks+Sources
>>>     Affects Versions: v1.1.0, v1.2.0
>>>             Reporter: Robert Mroczkowski
>>>          Attachments: HDFSEventSink.java.patch
>>>
>>>
>>> With configuration:
>>> agent.sinks.hdfs-cafe-access.type = hdfs
>>> agent.sinks.hdfs-cafe-access.hdfs.path >>   hdfs://nga/nga/apache/access/%y-%m-%d/
>>> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
>>> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
>>> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
>>> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
>>> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
>>> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
>>> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
>>> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
>>> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
>>> agent.sinks.hdfs-cafe-access.channel = memo-1
>>> When new directory is created previous file handle remains opened.
>> rollInterval setting is used only with files in current date bucket.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB