Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Log processing


+
Daniel Bruno 2013-02-25, 04:39
+
Jeff Lord 2013-02-25, 16:37
Copy link to this message
-
Re: Log processing
Thanks Jeff,

your explanation was very useful.
On Mon, Feb 25, 2013 at 12:37 PM, Jeff Lord <[EMAIL PROTECTED]> wrote:

> Daniel,
>
> Flume was designed as a configurable pipeline for discrete events in order
> to get them reliably from a source (e.g. web server application) -> to a
> destination (e.g. into hdfs).
> Flume provides the facility to write the same event to multiple
> destinations (e.g. HDFS and Hbase or HDFS and Cassandra).
> There is also a third party cassandra plugin (sink) for Flume NG that will
> write events into Cassandra.
> https://github.com/btoddb/flume-ng-cassandra-sink
> Whether or not you process the log "in the fly" is going to depend on your
> use case and resources, but if it is feasible than writing directly into
> Cassandra is probably going to be the most efficient.
>
> I am not personally familiar with the logprocessing plugin you mention but
> it appears to be built on top of the old flume.
> We highly recommend using Flume NG going forward, so it sounds like you
> might want to try Flume NG with the cassandra sink.
>
> Hope this helps.
>
> -Jeff
>
>
>
>
>
> On Sun, Feb 24, 2013 at 8:39 PM, Daniel Bruno <[EMAIL PROTECTED]>wrote:
>
>> Hello everyone,
>>
>> I'm researching about Flume as a solution for web analytics.
>>
>> I read some texts about that, and my idea is to use Flume to collect the
>> logs and put in a Cassadra database. But first i have some doubts that I
>> wanna share.
>>
>> Is a good approach process the log "in the fly" and insert it in the
>> database processed?
>>
>> Or is better collect the log, and store them (e.g. HDFS), and have
>> scheduled jobs with Pig and later insert in a database like HBase or
>> Cassandra?
>>
>> I found an interesting solution made by Gemini (now Cloudian) called
>> logprocessing, someone used it?
>>
>>
>> Thanks
>> --
>> Daniel Bruno
>> http://danielbruno.eti.br
>>
>
>
--
Daniel Bruno
http://danielbruno.eti.br
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB