Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> RE: State of Art in Hadoop Log aggregation


Copy link to this message
-
Re: State of Art in Hadoop Log aggregation
Just a clarification: Cloudera Manager is now free for any number of nodes.
Ref:
http://www.cloudera.com/content/cloudera/en/products/cloudera-manager.html

-Sandy
On Fri, Oct 11, 2013 at 7:05 AM, DSuiter RDX <[EMAIL PROTECTED]> wrote:

> Sagar,
>
> It sounds like you want a management console. We are using Cloudera
> Manager, but for 200 nodes you would need to license it, it is only free up
> to 50 nodes.
>
> The FOSS version of this is Ambari, iirc.
> http://incubator.apache.org/ambari/
>
> Flume will provide a Hadoop-integrated pipeline for ingesting data. The
> data will still need to be analyzed and visualized if you use Flume. Kafka
> is a newer project for collecting and aggregating logs, but is a separate
> project and will need a server of its own to manage.
>
> We use Splunk also, since it is approved by our auditing compliance agency.
>
> Thanks,
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 11, 2013 at 9:54 AM, Alexander Alten-Lorenz <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> http://flume.apache.org
>>
>> - Alex
>>
>> On Oct 11, 2013, at 7:36 AM, Sagar Mehta <[EMAIL PROTECTED]> wrote:
>>
>> Hi Guys,
>>
>> We have fairly decent sized Hadoop cluster of about 200 nodes and was
>> wondering what is the state of art if I want to aggregate and visualize
>> Hadoop ecosystem logs, particularly
>>
>>    1. Tasktracker logs
>>    2. Datanode logs
>>    3. Hbase RegionServer logs
>>
>> One way is to use something like a Flume on each node to aggregate the
>> logs and then use something like Kibana -
>> http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
>> make them searchable.
>>
>> However I don't want to write another ETL for the hadoop/hbase logs
>>  themselves. We currently log in to each machine individually to 'tail -F
>> logs' when there is an hadoop problem on a particular node.
>>
>> We want a better way to look at the hadoop logs themselves in a
>> centralized way when there is an issue without having to login to 100
>> different machines and was wondering what is the state of are in this
>> regard.
>>
>> Suggestions/Pointers are very welcome!!
>>
>> Sagar
>>
>>
>> --
>> Alexander Alten-Lorenz
>> http://mapredit.blogspot.com
>> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>>
>>
>