We have fairly decent sized Hadoop cluster of about 200 nodes and was
wondering what is the state of art if I want to aggregate and visualize
Hadoop ecosystem logs, particularly
1. Tasktracker logs
2. Datanode logs
3. Hbase RegionServer logs
One way is to use something like a Flume on each node to aggregate the logs
and then use something like Kibana -
http://www.elasticsearch.org/overview/kibana/ to visualize the logs and
make them searchable.
However I don't want to write another ETL for the hadoop/hbase logs
themselves. We currently log in to each machine individually to 'tail -F
logs' when there is an hadoop problem on a particular node.
We want a better way to look at the hadoop logs themselves in a centralized
way when there is an issue without having to login to 100 different
machines and was wondering what is the state of are in this regard.
Suggestions/Pointers are very welcome!!