|
|
Emile Kao 2012-12-04, 10:04
Hello guys, now that I have successfuly setup a running Flume / Hadoop system for my customer, I would like to ask for a help in trying to implement a requirement requested by the customer:
Here is how the use case is looking like:
1. Customer has many Apache Web server and WebSphere Application server that produce many logs.
2. Customer wants to provide the logs to the developer team without giving them direct access to the machines hosting the logs.
3. The idea is now to collect all the log files and put them together in one place and let the developer team get access to them through a web interface.
4. My goal is to resolve this problem using Flume / Hadoop
Questions:
1. Which is the best way to implement such a scenario using Flume/ Hadoop?
2. The customer would like to keep the log files in thier original state (file name, size, etc..). Is it practicable using Flume?
3. Is there a better way to collect the files without using "Exec source" and "tail -F" command?
Many Thanks and Cheers, Emile
+
Emile Kao 2012-12-04, 10:04
-
Re: A customer use case
Nitin Pawar 2012-12-04, 10:21
This is really doable with minimal efforts on your end.
Use flume and hdfs sink. You can actually name the files as you like and rollover on hdfs based on number of events,size or time.
Developers can then access the logs through hdfs namenode URI or a simple java dfs client inside a container can solve it as well with more security in place.
On the question of having better way of collecting logs, yes you can achieve it by using pipes but will be little complicate for very minimal performance improvement by my views. Others may suggest it otherwise. On Tue, Dec 4, 2012 at 3:34 PM, Emile Kao <[EMAIL PROTECTED]> wrote:
> Hello guys, > now that I have successfuly setup a running Flume / Hadoop system for my > customer, I would like to ask for a help in trying to implement a > requirement requested by the customer: > > Here is how the use case is looking like: > > 1. Customer has many Apache Web server and WebSphere Application server > that produce many logs. > > 2. Customer wants to provide the logs to the developer team without giving > them direct access to the machines hosting the logs. > > 3. The idea is now to collect all the log files and put them together in > one place and let the developer team get access to them through a web > interface. > > 4. My goal is to resolve this problem using Flume / Hadoop > > Questions: > > 1. Which is the best way to implement such a scenario using Flume/ Hadoop? > > 2. The customer would like to keep the log files in thier original state > (file name, size, etc..). Is it practicable using Flume? > > 3. Is there a better way to collect the files without using "Exec source" > and "tail -F" command? > > Many Thanks and Cheers, > Emile >
-- Nitin Pawar
+
Nitin Pawar 2012-12-04, 10:21
-
Re: A customer use case
Mike Percy 2012-12-04, 14:48
Hi Emile,
On Tue, Dec 4, 2012 at 2:04 AM, Emile Kao <[EMAIL PROTECTED]> wrote: > > 1. Which is the best way to implement such a scenario using Flume/ Hadoop? >
You could use the file spooling client / source to stream these files back in the latest trunk and upcoming Flume 1.3.0 builds, along with hdfs sink.
2. The customer would like to keep the log files in thier original state > (file name, size, etc..). Is it practicable using Flume? >
Not recommended. Flume is an event streaming system, not a file copying mechanism. If you want to do that, just use some scripts with hadoop fs -put instead of Flume. Flume provides a bunch of stream-oriented features on top of its event streaming architecture, such as data enrichment capabilities, event routing, and configurable file rolling on HDFS, to name a few.
Regards, Mike
+
Mike Percy 2012-12-04, 14:48
|
|