Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> file channel read performance impacted by write rate


Copy link to this message
-
Re: file channel read performance impacted by write rate
The only reason I can think of for reads being slower is the overwriteMap
in the FlumeEventQueue in which keys of many events are changed for takes
but for puts we just add the next key and insert it into the map.
On Tuesday, December 17, 2013, Brock Noland wrote:

> Hi,
>
> 1) You are only using a single disk for file channel and it looks like a
> single disk for both checkpoint and data directories therefore throughput
> is going to be extremely slow.
> 2) I asked for 8-10 thread dumps of the whole JVM but got two thread dumps
> of two random threads. If you don't want to share this info on-list, please
> send it to me directly...but I will need a bunch of thread dumps to debug
> thing.
>
> Brock
>
>
> On Tue, Dec 17, 2013 at 9:32 AM, Shangan Chen <[EMAIL PROTECTED]>wrote:
>
> the attachment flume.conf is channel and sink config, dumps.txt is thread
> dumps.
> channel type "dual" is a channel type I developped to utilize the merits
> of memory channel and filechannel. when the volume is not quite big, I use
> memory channel, when the size of events reach to a percentage of the memory
> channel capacity, it switch to the filechannel, when volume decrease switch
> to memory again.
>
> thanks for looking into this.
>
>
> On Tue, Dec 17, 2013 at 8:54 PM, Brock Noland <[EMAIL PROTECTED]> wrote:
>
> Can you take and share 8-10 thread dumps while the sink is taking events
> "slowly"?
>
> Can you share your machine and file channel configuration?
> On Dec 17, 2013 6:28 AM, "Shangan Chen" <[EMAIL PROTECTED]> wrote:
>
> we face the same problem, performance of taking events from channel is a
> severe bottleneck. When there're less events in channel,  problem does not
> alleviate. following is a log of the metrics of writing to hdfs, writing to
> 5 files with a batchsize of 200000, take cost the most of the total time.
>
>
> 17 十二月 2013 18:49:28,056 INFO
>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:489)  -
> HdfsSink-TIME-STAT sink[sink_hdfs_b] writers[5] eventcount[200000]
> all[44513] take[38197] append[5647] sync[17] getFilenameTime[371]
>
>
>
>
>
> On Mon, Nov 25, 2013 at 4:46 PM, Jan Van Besien <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Is anybody still looking into this question?
>
> Should I log it in jira such that somebody can look into it later?
>
> thanks,
> Jan
>
>
>
> On 11/18/2013 11:28 AM, Jan Van Besien wrote:
> > Hi,
> >
> > Sorry it took me a while to answer this. I compiled a small test case
> > using only off the shelve flume components that shows what is going on.
> >
> > The setup is a single agent with http source, null sink and file
> > channel. I am using the default configuration as much as possible.
> >
> > The test goes as follows:
> >
> > - start the agent without sink
> > - run a script that sends http requests in multiple threads to the http
> > source (the script simply calls the url http://localhost:8080/?key=value
> > over and over a gain, whereby value is a random string of 100 chars).
> > - this script does about 100 requests per second on my machine. I leave
> > it running for a while, such that the file channel contains about 20000
> > events.
> > - add the null sink to the configuration (around 11:14:33 in the log).
> > - observe the logging of the null sink. You'll see in the log file that
> > it takes more than 10 seconds per 1000 events (until about even 5000,
> > around 11:15:33)
> > - stop the http request generating script (i.e. no more writing in file
> > channel)
> > - observer the logging of the null sink: events 5000 until 20000 are all
> > processed within a few seconds.
> >
> > In the attachment:
> > - flume log
> > - thread dumps while the ingest was running and the null sink was enabled
> > - config (agent1.conf)
> >
> > I also tried with more sinks (4), see agent2.conf. The results are the
> same.
> >
> > Thanks for looking into this,
> > Jan
> >
> >
> > On 11/14/2013 05:08 PM, Brock Noland wrote:
> >> On Thu, Nov 14, 2013 at 2:50 AM, Jan Van Besien <ja
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB