Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing


Copy link to this message
-
Re: [jira] [Commented] (FLUME-1350) HDFS file handle not closed properly when date bucketing
Juhani Connolly 2012-10-19, 10:00
My implementation is synchronized on the writer map, and the append and
close operations on the bucketwriter are synchronized. It is possible
for a writer to rarely get closed before it's about to append but  that
is harmless as it will just back off and get a fresh writer the next
cycle. Also, if possible, please add comments to the jira thread when
the mail is generated from there :)

On 10/19/2012 05:13 AM, Roshan Naik wrote:
> Will need to handle race conditions like..  a thread resumes writing
> immediately after the watcher thread decides to close the file handle. In
> that sense a deterministic close is nicer than a timeout based 'garbage
> collection'
> -roshan
>
>
> On Thu, Oct 18, 2012 at 12:04 PM, Mike Percy (JIRA) <[EMAIL PROTECTED]> wrote:
>
>>      [
>> https://issues.apache.org/jira/browse/FLUME-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479255#comment-13479255]
>>
>> Mike Percy commented on FLUME-1350:
>> -----------------------------------
>>
>> Hi Juhani, something like a close-on-idle timeout makes sense. I'd be
>> happy to review it if you want to work on it.
>>
>>> HDFS file handle not closed properly when date bucketing
>>> ---------------------------------------------------------
>>>
>>>                  Key: FLUME-1350
>>>                  URL: https://issues.apache.org/jira/browse/FLUME-1350
>>>              Project: Flume
>>>           Issue Type: Bug
>>>           Components: Sinks+Sources
>>>     Affects Versions: v1.1.0, v1.2.0
>>>             Reporter: Robert Mroczkowski
>>>          Attachments: HDFSEventSink.java.patch
>>>
>>>
>>> With configuration:
>>> agent.sinks.hdfs-cafe-access.type = hdfs
>>> agent.sinks.hdfs-cafe-access.hdfs.path >>   hdfs://nga/nga/apache/access/%y-%m-%d/
>>> agent.sinks.hdfs-cafe-access.hdfs.fileType = DataStream
>>> agent.sinks.hdfs-cafe-access.hdfs.filePrefix = cafe_access
>>> agent.sinks.hdfs-cafe-access.hdfs.rollInterval = 21600
>>> agent.sinks.hdfs-cafe-access.hdfs.rollSize = 10485760
>>> agent.sinks.hdfs-cafe-access.hdfs.rollCount = 0
>>> agent.sinks.hdfs-cafe-access.hdfs.txnEventMax = 1000
>>> agent.sinks.hdfs-cafe-access.hdfs.batchSize = 1000
>>> #agent.sinks.hdfs-cafe-access.hdfs.codeC = snappy
>>> agent.sinks.hdfs-cafe-access.hdfs.hdfs.maxOpenFiles = 5000
>>> agent.sinks.hdfs-cafe-access.channel = memo-1
>>> When new directory is created previous file handle remains opened.
>> rollInterval setting is used only with files in current date bucket.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>