I've been trying to get the elasticsearch sink going for the passed day or
so. For the most part things just tend to work. However, once you start
getting into the 4-10k events per second range it seems to fall apart. This
is when the channel (memory in this case) tends to fill up indefinitely
since the sink does not appear quick enough to keep up.
I read in a couple places that adding multiple sinks (at least in the HDFS
case) can benefit with the throughput, and this did appear to help. I was
able to keep up when running 10 elasticsearch sinks with a 10,000
batchSize. The documentation seems a bit vague in this spot, so first off,
when you have multiple sinks attached to a single memory channel, do all
sinks have to ack the message and take care of it before it is removed or
is it similar to the publisher/consumer model where any consumer can take
the message off?
After I got the channel at a stable fill percentage (10k batch, 10
elasticsearch sinks), I began to notice my agent dying with no log
messages. So before I keep trying here, has anyone else ran into these
issues before? My elasticsearch cluster is 3 nodes tuned for write
performance and they do not seem overwhelmed by flume. I had considered a
second flume agent that solely dealt wth elasticsearch, since the current
config also includes 1 HDFS sink, but am unsure this will really help.