Flume was designed as a configurable pipeline for discrete events in order
to get them reliably from a source (e.g. web server application) -> to a
destination (e.g. into hdfs).
Flume provides the facility to write the same event to multiple
destinations (e.g. HDFS and Hbase or HDFS and Cassandra).
There is also a third party cassandra plugin (sink) for Flume NG that will
write events into Cassandra.
Whether or not you process the log "in the fly" is going to depend on your
use case and resources, but if it is feasible than writing directly into
Cassandra is probably going to be the most efficient.
I am not personally familiar with the logprocessing plugin you mention but
it appears to be built on top of the old flume.
We highly recommend using Flume NG going forward, so it sounds like you
might want to try Flume NG with the cassandra sink.
Hope this helps.
On Sun, Feb 24, 2013 at 8:39 PM, Daniel Bruno <[EMAIL PROTECTED]>wrote:
> Hello everyone,
> I'm researching about Flume as a solution for web analytics.
> I read some texts about that, and my idea is to use Flume to collect the
> logs and put in a Cassadra database. But first i have some doubts that I
> wanna share.
> Is a good approach process the log "in the fly" and insert it in the
> database processed?
> Or is better collect the log, and store them (e.g. HDFS), and have
> scheduled jobs with Pig and later insert in a database like HBase or
> I found an interesting solution made by Gemini (now Cloudian) called
> logprocessing, someone used it?
> Daniel Bruno