Daniel Bruno 2013-02-25, 04:39
Jeff Lord 2013-02-25, 16:37
your explanation was very useful.
On Mon, Feb 25, 2013 at 12:37 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:
> Flume was designed as a configurable pipeline for discrete events in order
> to get them reliably from a source (e.g. web server application) -> to a
> destination (e.g. into hdfs).
> Flume provides the facility to write the same event to multiple
> destinations (e.g. HDFS and Hbase or HDFS and Cassandra).
> There is also a third party cassandra plugin (sink) for Flume NG that will
> write events into Cassandra.
> Whether or not you process the log "in the fly" is going to depend on your
> use case and resources, but if it is feasible than writing directly into
> Cassandra is probably going to be the most efficient.
> I am not personally familiar with the logprocessing plugin you mention but
> it appears to be built on top of the old flume.
> We highly recommend using Flume NG going forward, so it sounds like you
> might want to try Flume NG with the cassandra sink.
> Hope this helps.
> On Sun, Feb 24, 2013 at 8:39 PM, Daniel Bruno <[EMAIL PROTECTED]>wrote:
>> Hello everyone,
>> I'm researching about Flume as a solution for web analytics.
>> I read some texts about that, and my idea is to use Flume to collect the
>> logs and put in a Cassadra database. But first i have some doubts that I
>> wanna share.
>> Is a good approach process the log "in the fly" and insert it in the
>> database processed?
>> Or is better collect the log, and store them (e.g. HDFS), and have
>> scheduled jobs with Pig and later insert in a database like HBase or
>> I found an interesting solution made by Gemini (now Cloudian) called
>> logprocessing, someone used it?
>> Daniel Bruno