Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume Data Directory Cleanup


Copy link to this message
-
Re: Flume Data Directory Cleanup
Lenin Raj 2013-07-18, 17:13
Hi Jeremy,

Regarding cleanup, it was discussed already once.

http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[EMAIL PROTECTED]%3E

You have to do it manually.
Thanks,
Lenin
On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson <[EMAIL PROTECTED]>wrote:

> To follow up:
>
> My Flume agent ran out of disk space last night and appeared to stop
> processing.  I shut it down and as an experiment (it's a test machine, why
> not?) I deleted the oldest 10 data files, to see if Flume actually needed
> these when it restarted.
>
> Flume was not happy with my choices.
>
> It spit out a lot of this:
>
> 2013-07-18 00:00:00,013 ERROR [pool-40-thread-1]        o.a.f.s.AvroSource
> Avro source mySource: Unable to process event batch. Exception follows.
> java.lang.IllegalStateException: Channel closed [channel=myFileChannel].
> Due to java.lang.NullPointerException: null
>         at
> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
>         at
> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
>         ...
> Caused by: java.lang.NullPointerException
>         at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
>         at org.apache.flume.channel.file.Log.replay(Log.java:406)
>         at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
>         ...
>
> So it seems like these files were actually in use, and not just leftover
> cruft.  A worthwhile thing to know, but I'd like to understand why.  My
> events are probably at most 1k of text, so it seems kind of odd to me that
> they'd consume more than 50GB of disk space in the channel.
>
> -- Jeremy
>
>
>
> On Wed, Jul 17, 2013 at 3:24 PM, Jeremy Karlson <[EMAIL PROTECTED]>wrote:
>
>> Hi All,
>>
>> I have a very busy channel that has about 100,000 events queued up.  My
>> data directory has about 50 data files, each about 1.6 GB.  I don't believe
>> my 100k events could be consuming that much space, so I'm jumping to
>> conclusions and assuming that most of these files are old and due for
>> cleanup (but I suppose it's possible).  I'm not finding much guidance in
>> the user guide on how often these files are cleaned up / removed /
>> compacted / etc.
>>
>> Any thoughts on what's going on here, or what settings I should look for?
>>  Thanks.
>>
>> -- Jeremy
>>
>
>