Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Large-scale collection of logs from multiple Hadoop nodes


Copy link to this message
-
Re: Large-scale collection of logs from multiple Hadoop nodes
We have been using a flume like system for such usecases at significantly
large scale and it has been working quite well.

Would like to hear thoughts/challenges around using zeromq alike systems at
good enough scale.

inder
"you are the average of 5 people you spend the most time with"
On Aug 5, 2013 11:29 PM, "Public Network Services" <
[EMAIL PROTECTED]> wrote:

> Hi...
>
> I am facing a large-scale usage scenario of log collection from a Hadoop
> cluster and examining ways as to how it should be implemented.
>
> More specifically, imagine a cluster that has hundreds of nodes, each of
> which constantly produces Syslog events that need to be gathered an
> analyzed at another point. The total amount of logs could be tens of
> gigabytes per day, if not more, and the reception rate in the order of
> thousands of events per second, if not more.
>
> One solution is to send those events over the network (e.g., using using
> flume) and collect them in one or more (less than 5) nodes in the cluster,
> or in another location, whereby the logs will be processed by a either
> constantly MapReduce job, or by non-Hadoop servers running some log
> processing application.
>
> Another approach could be to deposit all these events into a queuing
> system like ActiveMQ or RabbitMQ, or whatever.
>
> In all cases, the main objective is to be able to do real-time log
> analysis.
>
> What would be the best way of implementing the above scenario?
>
> Thanks!
>
> PNS
>
>