Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Review Request: ExecSource don't flush the cache if there is no input entries


Copy link to this message
-
Re: Review Request: ExecSource don't flush the cache if there is no input entries
Hari Shreedharan 2013-01-07, 22:20

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8854/#review15125
-----------------------------------------------------------
Thanks for the patch!

I like the idea, but it does not look like this approach is sufficient - since the timeout is checked only when a new line is written out. If no lines are written out, after a few initial are written, the flush never happens.

Also, please add a unit test for the feature.
flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
<https://reviews.apache.org/r/8854/#comment32717>

    How does this help? The readLine() method would block until the next line is read from the process's stdout right? So if the process writes only batchSize - 1 events before timeout and then never writes, the source would still not flush right? You probably need to add another thread to make sure the flush has happened.
    
    Also when you do this, you need to be careful about synchronization - you will probably need to put this inside a synchronized block or lock or something and put the timeout flush code in the same lock/synchronized block.
- Hari Shreedharan
On Jan. 7, 2013, 5:57 a.m., Fengdong Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8854/
> -----------------------------------------------------------
>
> (Updated Jan. 7, 2013, 5:57 a.m.)
>
>
> Review request for Flume.
>
>
> Description
> -------
>
> ExecSource has a default batchSize: 20, exec source read data from the source, then put it into the cache, after the cache is full, push it to the channel.
>
> but if exec source's cache is not full, and there isn't any input for a long time, then these entries always kept in the cache, there is no chance to the channel until the source's cache is full.
>
> so, the patch added a new config line: batchTimeout for ExecSource, and default is 3 seconds, if batchTimeout exceeded, push all cached data to the channel even the cache is not full.
>
>
> Diffs
> -----
>
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java 495b03f
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 1b35b01
>
> Diff: https://reviews.apache.org/r/8854/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Fengdong Yu
>
>