Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Large-scale collection of logs from multiple Hadoop nodes


Copy link to this message
-
Re: Large-scale collection of logs from multiple Hadoop nodes
We have been using a flume like system for such usecases at significantly
large scale and it has been working quite well.

Would like to hear thoughts/challenges around using zeromq alike systems at
good enough scale.

inder
"you are the average of 5 people you spend the most time with"
On Aug 5, 2013 11:29 PM, "Public Network Services" <
[EMAIL PROTECTED]> wrote:

> Hi...
>
> I am facing a large-scale usage scenario of log collection from a Hadoop
> cluster and examining ways as to how it should be implemented.
>
> More specifically, imagine a cluster that has hundreds of nodes, each of
> which constantly produces Syslog events that need to be gathered an
> analyzed at another point. The total amount of logs could be tens of
> gigabytes per day, if not more, and the reception rate in the order of
> thousands of events per second, if not more.
>
> One solution is to send those events over the network (e.g., using using
> flume) and collect them in one or more (less than 5) nodes in the cluster,
> or in another location, whereby the logs will be processed by a either
> constantly MapReduce job, or by non-Hadoop servers running some log
> processing application.
>
> Another approach could be to deposit all these events into a queuing
> system like ActiveMQ or RabbitMQ, or whatever.
>
> In all cases, the main objective is to be able to do real-time log
> analysis.
>
> What would be the best way of implementing the above scenario?
>
> Thanks!
>
> PNS
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB