Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume Data Directory Cleanup


Copy link to this message
-
Re: Flume Data Directory Cleanup
I did a hard delete.  (I was out of disk space.)  I ended up just deleting
the whole channel directory and starting fresh.

I am running a very recent version, so I don't think I'd be affected by the
file removal bug...  And obviously my files were still in use, for reasons
I don't understand yet.

-- Jeremy
On Thu, Jul 18, 2013 at 11:09 AM, Hari Shreedharan <
[EMAIL PROTECTED]> wrote:

> Flume's deletion strategy is quite conservative. We do wait for 2
> checkpoints after all data was removed from a file before the files are
> deleted. In this case, it does look like the data was actually still
> referenced. We had a bug sometime back that caused files to not be deleted
> - but that was fixed quite a while back.
>
>
> Hari
>
>
> Thanks,
> Hari
>
> On Thursday, July 18, 2013 at 10:56 AM, Camp, Roy wrote:
>
>  We have noticed a few times that cleanup did not happen properly but a
> restart generally forced a cleanup.  ****
>
> ** **
>
> I would recommend putting the data files back unless you did a hard
> delete.  Alternatively, make sure you remove (backup first) the checkpoint
> files if you delete the data files.  That should put flume back to a fresh
> state. ****
>
> ** **
>
> Roy****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Jeremy Karlson [mailto:[EMAIL PROTECTED]<[EMAIL PROTECTED]>]
>
> *Sent:* Thursday, July 18, 2013 10:42 AM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Flume Data Directory Cleanup****
>
> ** **
>
> Thank you for your suggestion.  I took a careful look at that, and I'm not
> sure it describes my situation.  That refers to the sink, while my problem
> is with the channel.  I'm looking at a dramatic accumulation of log / meta
> files within the channel data directory.
>
> Additionally, I did try doing a manual cleanup of the channel directory,
> deleting the oldest log / meta files.  (This was my experiment.)  Flume
> really did not like that.  If it is required in the channel as well, the
> cutoff point at which the files go from being used to unused is not clear
> to me.****
>
> ** **
>
> -- Jeremy****
>
> ** **
>
> On Thu, Jul 18, 2013 at 10:13 AM, Lenin Raj <[EMAIL PROTECTED]> wrote:*
> ***
>
> Hi Jeremy,
>
> Regarding cleanup, it was discussed already once.
>
>
> http://mail-archives.apache.org/mod_mbox/flume-user/201306.mbox/%[EMAIL PROTECTED]%3E
> ****
>
> You have to do it manually.****
>
>
> ****
>
>
> Thanks,
> Lenin****
>
> ** **
>
> On Thu, Jul 18, 2013 at 10:36 PM, Jeremy Karlson <[EMAIL PROTECTED]>
> wrote:****
>
> To follow up:****
>
> ** **
>
> My Flume agent ran out of disk space last night and appeared to stop
> processing.  I shut it down and as an experiment (it's a test machine, why
> not?) I deleted the oldest 10 data files, to see if Flume actually needed
> these when it restarted.****
>
> ** **
>
> Flume was not happy with my choices.****
>
> ** **
>
> It spit out a lot of this:****
>
> ** **
>
> 2013-07-18 00:00:00,013 ERROR [pool-40-thread-1]        o.a.f.s.AvroSource
> Avro source mySource: Unable to process event batch. Exception follows.
> java.lang.IllegalStateException: Channel closed [channel=myFileChannel].
> Due to java.lang.NullPointerException: null****
>
>         at
> org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
> ****
>
>         at
> org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
> ****
>
>         ...****
>
> Caused by: java.lang.NullPointerException****
>
>         at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
> ****
>
>         at org.apache.flume.channel.file.Log.replay(Log.java:406)****
>
>         at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)****
>
>         ...****
>
> ** **
>
> So it seems like these files were actually in use, and not just leftover
> cruft.  A worthwhile thing to know, but I'd like to understand why.  My
> events are probably at most 1k of text, so it seems kind of odd to me that
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB