Are you sure all your events were taken off the channel by the sink?
Did you verify all the data you sent landed at the final destination? I
have had my file channel backup like this when sinking to a slow source
but eventually the file channel empties to a few MB provided I'm not
adding data faster than the sink can remove it.
I have only seen a similar problem once while evaluating flume but was
unable to reproduce. I had 4 parallel flows. I killed the agents in
the storage/filter tier (http://blogs.apache.org/flume/) and let logs
backup up in the collector tier. I watched the file channels on the
collector tier grow to tens of GB each before restarting the
storage/filter tier agents. 3 of the 4 file channels backing the 4
parallel flows drained to a few MB each. The 4th however did not. Even
after I stopped putting data on the flows and verified all data
successfully landed in the final sink location the 4th channel was still
50+ GB. I stopped and restarted the agent and the agent iterated
through all the data/checkpoint files. Ultimately it sent a couple more
batches of events but the channel emptied.
So yes, I have seen your problem however it was either explainable or
not reproducible. Explainable in the case where data is added to the
channel faster than the sink can remove it and not reproducible the one
time but Flumed fixed itself on a restart.
Because of the one time I witnessed the channel not clearing I will be
monitoring the file channel size outside of flume as a precaution when
we move flume to production.
On 04/11/2013 02:37 PM, Madhu Gmail wrote:
> I have not heard from anyone. so just want make sure I have explained the issue correctly.
> I think this is a common problem for everyone who uses it flume.
> when flume sink consumes the log event from file channel, what will happen to the data that is committed to local disk under data directory.
> will it grow indefinitely like log-1, log-2, log-3.....and so on ???
> do I have to write script to remove the data from data directory ??
> Madhu Munagala
> On Apr 11, 2013, at 11:52 AM, Madhu Gmail <[EMAIL PROTECTED]> wrote:
>> How to clean up the data in file channel data folder. After the log events are processed by the sink, I still see the log-1 and log-2 shows 1.6GB and 1.2GB.
>> once the log events are processed by the sink, the channel should not have any data in data directory under file-channel ....??
>> Madhu Munagala
This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s). Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender. If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.