|
|
Karthikeyan Muthukumarasa... 2012-09-27, 13:55
Hi, In my project various applications and 3PPs write log into to their separate logfiles. There are two limitations with this: - the structure of the log messages are different in each log file - the log messages are in different files and I cant get a single time sorted display of all log messages, which is important in some debug situations
As a solution to this problem, I intend to: - use separate flume sources to tail various log files in the system - have interceptors for each type of flume source and convert all log messages to a common structure - all flume sinks will write to a localhost avro port - a separate flume source will read from the avro port on localhost - there will be a fan-out logic to post the data from that source to multiple channels - each connel is connected to a separate sink like JMX sink, HBase Sink etc
First of all, is this kind of usage of flume acceptable and is there anything I need to specifically take care of?
I also notice that the consolidated avro source which reads data from avro port gets data only as blocks from each source, the latency is around 20 seconds. Is it possible to reduce this latency, so at the consolidated avro source, I receive all events as they are getting logged into their log files, instantaneously?
Thanks in advance! MK
+
Karthikeyan Muthukumarasa... 2012-09-27, 13:55
-
Re: Flume latency issue
Mike Percy 2012-09-27, 21:08
MK, Which version of Flume are you using?
This design sounds reasonable to me.
Regarding your question about 20 second latency, that should not be true. Can you please confirm that you have observed this? If you have, please attach your flume.conf and we can take a look. But it should be nearly instantaneous - the batch sizes are controlled by the client or sink, and in general the sink takes as much as it can from the channel, up to the batch size, but if it takes less we continue immediately regardless. The only time we "back off" is when the channel is empty *and* no events were taken in the current batch - then the sink runner goes into an exponential backoff.
Regards, Mike
On Thu, Sep 27, 2012 at 6:55 AM, Karthikeyan Muthukumarasamy < [EMAIL PROTECTED]> wrote:
> Hi, > In my project various applications and 3PPs write log into to their > separate logfiles. > There are two limitations with this: > - the structure of the log messages are different in each log file > - the log messages are in different files and I cant get a single time > sorted display of all log messages, which is important in some debug > situations > > As a solution to this problem, I intend to: > - use separate flume sources to tail various log files in the system > - have interceptors for each type of flume source and convert all log > messages to a common structure > - all flume sinks will write to a localhost avro port > - a separate flume source will read from the avro port on localhost > - there will be a fan-out logic to post the data from that source to > multiple channels > - each connel is connected to a separate sink like JMX sink, HBase Sink etc > > First of all, is this kind of usage of flume acceptable and is there > anything I need to specifically take care of? > > I also notice that the consolidated avro source which reads data from avro > port gets data only as blocks from each source, the latency is around 20 > seconds. Is it possible to reduce this latency, so at the consolidated avro > source, I receive all events as they are getting logged into their log > files, instantaneously? > > Thanks in advance! > MK > >
+
Mike Percy 2012-09-27, 21:08
-
Re: Flume latency issue
Karthikeyan Muthukumarasa... 2012-09-28, 04:23
Hi Mike, Thanks for the response! Im use flume-ng-1.2.0 version.
In the prototype that Im building, the final consolidated sink writes to a file. I intend to extend this with more specific sinks like HBase, JMX etc. While Im writing this mail, I get a doubt if the latency is caused by some buffering happening at the final FILE_ROLL sink! I have test scripts loading messages with timestamp every one second to the files hblog, zklog & applog. I expect the final consolidated sink's output to be something like this: 10:00:00 hblaselog entry 10:00:00 zklog entry 10:00:00 applog entry 10:00:01 hblaselog entry 10:00:01 zklog entry 10:00:01 applog entry 10:00:02 hblaselog entry 10:00:02 zklog entry 10:00:02 applog entry combos like above...
But in reality, some batching seems to be occuring in between and once every 10 secs, I see that the consolidated sink writes chunks (from each src) to the output file as follows: 10:00:00 hblaselog entry 10:00:01 hblaselog entry 10:00:02 hblaselog entry (17 more like this) delay... 10:00:00 zklog entry 10:00:01 zklog entry 10:00:02 zklog entry (17 more like this) delay... 10:00:00 applog entry 10:00:01 applog entry 10:00:02 applog entry
My Flume conf file is as below: # example.conf: A single-node Flume configuration
# Name the components on this agent agent1.sources = hbase-src zk-src app-src consolidated-src agent1.sinks = hbase-sink zk-sink app-sink consolidated-sink agent1.channels = hbase-chn zk-chn app-chn consolidated-chn
# All channels are in-memory channels agent1.channels.hbase-chn.type = memory agent1.channels.zk-chn.type = memory agent1.channels.app-chn.type = memory agent1.channels.consolidated-chn.type = memory
# Describe/configure hbase-src agent1.sources.hbase-src.type = exec agent1.sources.hbase-src.command = tail -F /home/efhjlns/scripts/hblog agent1.sources.hbase-src.channels = hbase-chn
# Describe avro hbase-sink agent1.sinks.hbase-sink.type = avro agent1.sinks.hbase-sink.hostname = localhost agent1.sinks.hbase-sink.port = 15001 agent1.sinks.hbase-sink.channel = hbase-chn
# Describe/configure zk-src agent1.sources.zk-src.type = exec agent1.sources.zk-src.command = tail -F /home/efhjlns/scripts/zklog agent1.sources.zk-src.channels = zk-chn
# Describe avro zk-sink agent1.sinks.zk-sink.type = avro agent1.sinks.zk-sink.hostname = localhost agent1.sinks.zk-sink.port = 15001 agent1.sinks.zk-sink.channel = zk-chn # Describe/configure app-src agent1.sources.app-src.type = exec agent1.sources.app-src.command = tail -F /home/efhjlns/scripts/applog agent1.sources.app-src.channels = app-chn
# Describe avro app-sink agent1.sinks.app-sink.type = avro agent1.sinks.app-sink.hostname = localhost agent1.sinks.app-sink.port = 15001 agent1.sinks.app-sink.channel = app-chn
# Describe/configure consolidated-src agent1.sources.consolidated-src.type = avro agent1.sources.consolidated-src.bind = localhost agent1.sources.consolidated-src.port = 15001 agent1.sources.consolidated-src.channels = consolidated-chn
# Describe consolidated file sink
agent1.sinks.consolidated-sink.type = FILE_ROLL agent1.sinks.consolidated-sink.sink.directory = /home/efhjlns/flume-opt-dir agent1.sinks.consolidated-sink.sink.rollInterval = 0 agent1.sinks.consolidated-sink.channel = consolidated-chn
Thanks & Regards MK On Fri, Sep 28, 2012 at 2:38 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
> MK, > Which version of Flume are you using? > > This design sounds reasonable to me. > > Regarding your question about 20 second latency, that should not be true. > Can you please confirm that you have observed this? If you have, please > attach your flume.conf and we can take a look. But it should be nearly > instantaneous - the batch sizes are controlled by the client or sink, and > in general the sink takes as much as it can from the channel, up to the > batch size, but if it takes less we continue immediately regardless. The > only time we "back off" is when the channel is empty *and* no events were > taken in the current batch - then the sink runner goes into an exponential
+
Karthikeyan Muthukumarasa... 2012-09-28, 04:23
-
Re: Flume latency issue
Mike Percy 2012-09-28, 22:01
Hi MK, Based on a quick look @ the code, it looks like the RollingFileSink doesn't short-circuit on batches like most of the rest of the sinks do. I would consider that a minor bug and should be fixed.
So basically it will wait until it can pull batchSize events off the queue before pushing the data onto the file system.
Regards, Mike
On Thu, Sep 27, 2012 at 9:23 PM, Karthikeyan Muthukumarasamy < [EMAIL PROTECTED]> wrote:
> Hi Mike, > Thanks for the response! > Im use flume-ng-1.2.0 version. > > In the prototype that Im building, the final consolidated sink writes to a > file. I intend to extend this with more specific sinks like HBase, JMX etc. > While Im writing this mail, I get a doubt if the latency is caused by some > buffering happening at the final FILE_ROLL sink! > I have test scripts loading messages with timestamp every one second to > the files hblog, zklog & applog. > I expect the final consolidated sink's output to be something like this: > 10:00:00 hblaselog entry > 10:00:00 zklog entry > 10:00:00 applog entry > 10:00:01 hblaselog entry > 10:00:01 zklog entry > 10:00:01 applog entry > 10:00:02 hblaselog entry > 10:00:02 zklog entry > 10:00:02 applog entry > combos like above... > > But in reality, some batching seems to be occuring in between and once > every 10 secs, I see that the consolidated sink writes chunks (from each > src) to the output file as follows: > 10:00:00 hblaselog entry > 10:00:01 hblaselog entry > 10:00:02 hblaselog entry > (17 more like this) > delay... > 10:00:00 zklog entry > 10:00:01 zklog entry > 10:00:02 zklog entry > (17 more like this) > delay... > 10:00:00 applog entry > 10:00:01 applog entry > 10:00:02 applog entry > > My Flume conf file is as below: > # example.conf: A single-node Flume configuration > > # Name the components on this agent > agent1.sources = hbase-src zk-src app-src consolidated-src > agent1.sinks = hbase-sink zk-sink app-sink consolidated-sink > agent1.channels = hbase-chn zk-chn app-chn consolidated-chn > > # All channels are in-memory channels > agent1.channels.hbase-chn.type = memory > agent1.channels.zk-chn.type = memory > agent1.channels.app-chn.type = memory > agent1.channels.consolidated-chn.type = memory > > # Describe/configure hbase-src > agent1.sources.hbase-src.type = exec > agent1.sources.hbase-src.command = tail -F /home/efhjlns/scripts/hblog > agent1.sources.hbase-src.channels = hbase-chn > > # Describe avro hbase-sink > agent1.sinks.hbase-sink.type = avro > agent1.sinks.hbase-sink.hostname = localhost > agent1.sinks.hbase-sink.port = 15001 > agent1.sinks.hbase-sink.channel = hbase-chn > > # Describe/configure zk-src > agent1.sources.zk-src.type = exec > agent1.sources.zk-src.command = tail -F /home/efhjlns/scripts/zklog > agent1.sources.zk-src.channels = zk-chn > > # Describe avro zk-sink > agent1.sinks.zk-sink.type = avro > agent1.sinks.zk-sink.hostname = localhost > agent1.sinks.zk-sink.port = 15001 > agent1.sinks.zk-sink.channel = zk-chn > > > # Describe/configure app-src > agent1.sources.app-src.type = exec > agent1.sources.app-src.command = tail -F /home/efhjlns/scripts/applog > agent1.sources.app-src.channels = app-chn > > # Describe avro app-sink > agent1.sinks.app-sink.type = avro > agent1.sinks.app-sink.hostname = localhost > agent1.sinks.app-sink.port = 15001 > agent1.sinks.app-sink.channel = app-chn > > # Describe/configure consolidated-src > agent1.sources.consolidated-src.type = avro > agent1.sources.consolidated-src.bind = localhost > agent1.sources.consolidated-src.port = 15001 > agent1.sources.consolidated-src.channels = consolidated-chn > > # Describe consolidated file sink > > agent1.sinks.consolidated-sink.type = FILE_ROLL > agent1.sinks.consolidated-sink.sink.directory = /home/efhjlns/flume-opt-dir > agent1.sinks.consolidated-sink.sink.rollInterval = 0 > agent1.sinks.consolidated-sink.channel = consolidated-chn > > Thanks & Regards > MK > > > On Fri, Sep 28, 2012 at 2:38 AM, Mike Percy <[EMAIL PROTECTED]> wrote: > >> MK,
+
Mike Percy 2012-09-28, 22:01
|
|