Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume Data Directory Cleanup


Copy link to this message
-
Re: Flume Data Directory Cleanup
Thank you for your suggestion.  I took a careful look at that, and I'm not
sure it describes my situation.  That refers to the sink, while my problem
is with the channel.  I'm looking at a dramatic accumulation of log / meta
files within the channel data directory.

Additionally, I did try doing a manual cleanup of the channel directory,
deleting the oldest log / meta files.  (This was my experiment.)  Flume
really did not like that.  If it is required in the channel as well, the
cutoff point at which the files go from being used to unused is not clear
to me.

-- Jeremy
On Thu, Jul 18, 2013 at 10:13 AM, Lenin Raj <[EMAIL PROTECTED]> wrote:

> Hi Jeremy,
>
> Regarding cleanup, it was discussed already once.
>
>
> http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[EMAIL PROTECTED]%3E
>
> You have to do it manually.
>
>
> Thanks,
> Lenin
>
>
> On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson <[EMAIL PROTECTED]>wrote:
>
>> To follow up:
>>
>> My Flume agent ran out of disk space last night and appeared to stop
>> processing.  I shut it down and as an experiment (it's a test machine, why
>> not?) I deleted the oldest 10 data files, to see if Flume actually needed
>> these when it restarted.
>>
>> Flume was not happy with my choices.
>>
>> It spit out a lot of this:
>>
>> 2013-07-18 00:00:00,013 ERROR [pool-40-thread-1]
>>  o.a.f.s.AvroSource Avro source mySource: Unable to process event batch.
>> Exception follows. java.lang.IllegalStateException: Channel closed
>> [channel=myFileChannel]. Due to java.lang.NullPointerException: null
>>         at
>> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
>>         at
>> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
>>         ...
>> Caused by: java.lang.NullPointerException
>>         at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
>>         at org.apache.flume.channel.file.Log.replay(Log.java:406)
>>         at
>> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
>>         ...
>>
>> So it seems like these files were actually in use, and not just leftover
>> cruft.  A worthwhile thing to know, but I'd like to understand why.  My
>> events are probably at most 1k of text, so it seems kind of odd to me that
>> they'd consume more than 50GB of disk space in the channel.
>>
>> -- Jeremy
>>
>>
>>
>> On Wed, Jul 17, 2013 at 3:24 PM, Jeremy Karlson <[EMAIL PROTECTED]>wrote:
>>
>>> Hi All,
>>>
>>> I have a very busy channel that has about 100,000 events queued up.  My
>>> data directory has about 50 data files, each about 1.6 GB.  I don't believe
>>> my 100k events could be consuming that much space, so I'm jumping to
>>> conclusions and assuming that most of these files are old and due for
>>> cleanup (but I suppose it's possible).  I'm not finding much guidance in
>>> the user guide on how often these files are cleaned up / removed /
>>> compacted / etc.
>>>
>>> Any thoughts on what's going on here, or what settings I should look
>>> for?  Thanks.
>>>
>>> -- Jeremy
>>>
>>
>>
>