Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Large-scale collection of logs from multiple Hadoop nodes

Copy link to this message
Re: Large-scale collection of logs from multiple Hadoop nodes
We have been using a flume like system for such usecases at significantly
large scale and it has been working quite well.

Would like to hear thoughts/challenges around using zeromq alike systems at
good enough scale.

"you are the average of 5 people you spend the most time with"
On Aug 5, 2013 11:29 PM, "Public Network Services" <

> Hi...
> I am facing a large-scale usage scenario of log collection from a Hadoop
> cluster and examining ways as to how it should be implemented.
> More specifically, imagine a cluster that has hundreds of nodes, each of
> which constantly produces Syslog events that need to be gathered an
> analyzed at another point. The total amount of logs could be tens of
> gigabytes per day, if not more, and the reception rate in the order of
> thousands of events per second, if not more.
> One solution is to send those events over the network (e.g., using using
> flume) and collect them in one or more (less than 5) nodes in the cluster,
> or in another location, whereby the logs will be processed by a either
> constantly MapReduce job, or by non-Hadoop servers running some log
> processing application.
> Another approach could be to deposit all these events into a queuing
> system like ActiveMQ or RabbitMQ, or whatever.
> In all cases, the main objective is to be able to do real-time log
> analysis.
> What would be the best way of implementing the above scenario?
> Thanks!