Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume latency issue


+
Karthikeyan Muthukumarasa... 2012-09-27, 13:55
+
Mike Percy 2012-09-27, 21:08
+
Karthikeyan Muthukumarasa... 2012-09-28, 04:23
Copy link to this message
-
Re: Flume latency issue
Hi MK,
Based on a quick look @ the code, it looks like the RollingFileSink doesn't
short-circuit on batches like most of the rest of the sinks do. I would
consider that a minor bug and should be fixed.

So basically it will wait until it can pull batchSize events off the queue
before pushing the data onto the file system.

Regards,
Mike

On Thu, Sep 27, 2012 at 9:23 PM, Karthikeyan Muthukumarasamy <
[EMAIL PROTECTED]> wrote:

> Hi Mike,
> Thanks for the response!
> Im use flume-ng-1.2.0 version.
>
> In the prototype that Im building, the final consolidated sink writes to a
> file. I intend to extend this with more specific sinks like HBase, JMX etc.
> While Im writing this mail, I get a doubt if the latency is caused by some
> buffering happening at the final FILE_ROLL sink!
> I have test scripts loading messages with timestamp every one second to
> the files hblog, zklog & applog.
> I expect the final consolidated sink's output to be something like this:
> 10:00:00 hblaselog entry
> 10:00:00 zklog entry
> 10:00:00 applog entry
> 10:00:01 hblaselog entry
> 10:00:01 zklog entry
> 10:00:01 applog entry
> 10:00:02 hblaselog entry
> 10:00:02 zklog entry
> 10:00:02 applog entry
> combos like above...
>
> But in reality, some batching seems to be occuring in between and once
> every 10 secs, I see that the consolidated sink writes chunks (from each
> src) to the output file as follows:
> 10:00:00 hblaselog entry
> 10:00:01 hblaselog entry
> 10:00:02 hblaselog entry
> (17 more like this)
> delay...
> 10:00:00 zklog entry
> 10:00:01 zklog entry
> 10:00:02 zklog entry
> (17 more like this)
> delay...
> 10:00:00 applog entry
> 10:00:01 applog entry
> 10:00:02 applog entry
>
> My Flume conf file is as below:
> # example.conf: A single-node Flume configuration
>
> # Name the components on this agent
> agent1.sources = hbase-src zk-src app-src consolidated-src
> agent1.sinks = hbase-sink zk-sink app-sink consolidated-sink
> agent1.channels = hbase-chn zk-chn app-chn consolidated-chn
>
> # All channels are in-memory channels
> agent1.channels.hbase-chn.type = memory
> agent1.channels.zk-chn.type = memory
> agent1.channels.app-chn.type = memory
> agent1.channels.consolidated-chn.type = memory
>
> # Describe/configure hbase-src
> agent1.sources.hbase-src.type = exec
> agent1.sources.hbase-src.command = tail -F /home/efhjlns/scripts/hblog
> agent1.sources.hbase-src.channels = hbase-chn
>
> # Describe avro hbase-sink
> agent1.sinks.hbase-sink.type = avro
> agent1.sinks.hbase-sink.hostname = localhost
> agent1.sinks.hbase-sink.port = 15001
> agent1.sinks.hbase-sink.channel = hbase-chn
>
> # Describe/configure zk-src
> agent1.sources.zk-src.type = exec
> agent1.sources.zk-src.command = tail -F /home/efhjlns/scripts/zklog
> agent1.sources.zk-src.channels = zk-chn
>
> # Describe avro zk-sink
> agent1.sinks.zk-sink.type = avro
> agent1.sinks.zk-sink.hostname = localhost
> agent1.sinks.zk-sink.port = 15001
> agent1.sinks.zk-sink.channel = zk-chn
>
>
> # Describe/configure app-src
> agent1.sources.app-src.type = exec
> agent1.sources.app-src.command = tail -F /home/efhjlns/scripts/applog
> agent1.sources.app-src.channels = app-chn
>
> # Describe avro app-sink
> agent1.sinks.app-sink.type = avro
> agent1.sinks.app-sink.hostname = localhost
> agent1.sinks.app-sink.port = 15001
> agent1.sinks.app-sink.channel = app-chn
>
> # Describe/configure consolidated-src
> agent1.sources.consolidated-src.type = avro
> agent1.sources.consolidated-src.bind = localhost
> agent1.sources.consolidated-src.port = 15001
> agent1.sources.consolidated-src.channels = consolidated-chn
>
> # Describe consolidated file sink
>
> agent1.sinks.consolidated-sink.type = FILE_ROLL
> agent1.sinks.consolidated-sink.sink.directory = /home/efhjlns/flume-opt-dir
> agent1.sinks.consolidated-sink.sink.rollInterval = 0
> agent1.sinks.consolidated-sink.channel = consolidated-chn
>
> Thanks & Regards
> MK
>
>
> On Fri, Sep 28, 2012 at 2:38 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
>> MK,