Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Syslog Infrastructure with Flume

Josh West 2012-10-26, 14:05
Ron Thielen 2012-10-26, 21:06
Ralph Goers 2012-10-29, 00:45
Roshan Naik 2012-10-29, 23:37
Copy link to this message
Re: Syslog Infrastructure with Flume
Very cool.  Hcatalog seems like a nice idea, as otherwise lots of
thought and planning must go into how one stores their data ... ensuring
it can be read from all different Apache Hadoop related projects... E.G.
if you store flume data in an HDFS path with
/something=partition1/foo=bar, you'll need to use special Pig libraries
to load the partitions.

Keep us updated please!

On 10/30/2012 12:37 AM, Roshan Naik wrote:
> I am in the process of investigating the possibility of creating  a
> HCatalog sink for Flume which should be able to handle such use cases.
> For your use case it could be thought of as a Hive sink. Goal is
> basically as follows... it would allow multiple flume agents to pump
> logs into a hive tables. That would make the data query-able without
> additional manual steps. Data will get added periodically in the form
> of new partitions to Hive. You would not have to deal with temporary
> files or manual addition of data into hive.
> -roshan
> On Sun, Oct 28, 2012 at 5:45 PM, Ralph Goers
>     Since you ask...
>     In our environment our primary concern is audit logs - have have
>     to audit banking transactions as well as changes administrators
>     make. We have a legacy system that needed to be integrated that
>     had records in a form different than what we want stored. We also
>     need to allow administrators to view events as close to real time
>     as possible. Plus we have to aggregate data across 2 data centers.
>     Although we are currently not including web server access logs we
>     plan to integrate them in over time.  We also have requirements
>     from our security team to pass events for their use to ArcSight.
>     1. We have a "log extractor" that receives legacy events as they
>     occur and converts them into our new format and passes them to
>     Flume. All new applications use the Log4j 2 Flume Appender to get
>     data to Flume.
>     2. Flume passes the data to ArcSight for our security team's use.
>     3. We wrote a Flume to Cassandra Sink.
>     4. We wrote our own REST query services to retrieve the data from
>     Cassandra.
>     5. Since we are using DataStax Enterprise version of Cassandra we
>     have also set up "Analytic" nodes that run Hadoop on top of
>     Cassandra. This allows the data to be accessed via normal Hadoop
>     tools for data analytics.
>     6. We have written our own reporting UI component in our
>     Administrative Platform to allow administrators to view activities
>     in real time or to schedule background data collection so users
>     can post process the data on their own.
>     We do not have anything to allow an admin to "tail" the log but it
>     wouldn't be hard at all to write an application to accept Flume
>     events via Avro and display the last "n" events as they arrive.
>     One thing I should point out. We format our events in accordance
>     with RFC 5424 and store that in the Flume event body. We then
>     store all our individual pieces of audit event data in Flume
>     headers fields.  The RFC 5424 message is what we send to ArcSight.
>     The event fields and the compressed body are all stored in
>     individual columns in Cassandra.
>     Ralph
>     On Oct 26, 2012, at 2:06 PM, Ron Thielen wrote:
>>     I am exactly where you are with this, except for the problem of
>>     my not having had time to write a serializer to address the
>>     Hostname Timestamp issue.Questions about the use of Flume in this
>>     manner seem to recur on a regular basis, so it seems a common use
>>     case.
>>     Sorry I cannot offer a solution since I am in your shoes at the
>>     moment, unfortunately looking at storing logs twice.
>>     Ron Thielen
>>     <image001.jpg>
>>     *From:*Josh West [mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>]
>>     *Sent:*Friday, October 26, 2012 9:05 AM

Josh West
Lead Systems Administrator
Hari Shreedharan 2012-10-31, 19:22
Roshan Naik 2012-10-31, 20:31
Hari Shreedharan 2012-10-31, 20:39
Roshan Naik 2012-11-01, 18:19
Hari Shreedharan 2012-11-01, 22:52
Roshan Naik 2012-11-01, 22:54
Josh West 2012-10-30, 09:42