Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Archive Task Logs (Stdout, Stderr, Sysogs) and Job Tracker logs of a Hadoop Cluster for later analysis


+
Christian Schneider 2013-04-08, 14:33
+
Israel Ekpo 2013-04-08, 17:41
Copy link to this message
-
Re: Archive Task Logs (Stdout, Stderr, Sysogs) and Job Tracker logs of a Hadoop Cluster for later analysis
Hi Isreal,

Thank you for  this details answer.
I'll give it a try.

Best Regards,
Christian.
2013/4/8 Israel Ekpo <[EMAIL PROTECTED]>

> Christian,
>
> From your comments, it seems Flume will be the right tool for the task.
>
> The SpoolingDirectorySource would be a great choice for the task you have
> since the log data has already been generated.
>
> However, the Spooling Directory Source requires that the files be
> immutable.
>
> This means once a file is created or dropped in the spooling directory it
> cannot be modified.
>
> Consequently, you will not be able to just use the currently log directory
> where the log files are continuously being appended to.
>
> I would recommend that you set aside a separate directory for spooling for
> Flume and then set up some sort of cronjob or scheduled task that will
> periodically drop the logs into the spooling directory after traversing the
> symlinks and recursively processing the log directories.
>
> The SpoolingDirectorySource currently does not recursively traverse the
> spooled folders.
>
> It assumes that all the files you plan to consume are in the root folder
> you are spooling.
>
> Use FileChannel as the channel as this is more reliable.
>
> Depending of the type of analysis you want to conduct, the
> ElasticSearchSink might be a good choice for your sink.
>
> Feel free to review the user guide for other options for Sinks.
>
> http://flume.apache.org/FlumeUserGuide.html
>
> You can also set up your own custom sink if you have other centralized
> datastores in mind.
>
> Spend some time to go through the user guide and developer guide so that
> you can get a better understanding of the architecture and use cases.
>
> http://flume.apache.org/FlumeUserGuide.html
>
> http://flume.apache.org/FlumeDeveloperGuide.html
>
>
> On 8 April 2013 10:33, Christian Schneider <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>> I need to collect log data from our Cluster.
>>
>> For this I think I need to copy the Contents of:
>> * JobTracker: /var/log/hadoop-0.20-mapreduce/history/
>> * TaskTracker: /var/log/hadoop-0.20-mapreduce/userlogs/
>>
>> It should also follow symlinks and copy recusrive.
>>
>> Is flume the right tool to do this?
>>
>> E.g. with the "Spooling Directory Source"?
>>
>> Best Regards,
>> Christian.
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB