Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: M/R Staticstics


Copy link to this message
-
Re: M/R Staticstics

As an addendum I looked to see what was installed with  apt-cache and
got the following outout

kevin@devUbuntu05:~$ apt-cache search hadoop
python-mrjob - MapReduce framework for writing and running Hadoop
Streaming jobs
ubuntu-orchestra-modules-hadoop - Modules mainly used by
orchestra-management-server
flume-ng - reliable, scalable, and manageable distributed data
collection application
hadoop - A software platform for processing vast amounts of data
hadoop-0.20-conf-pseudo - Hadoop installation in pseudo-distributed mode
with MRv1
hadoop-0.20-mapreduce - A software platform for processing vast amounts
of data
hadoop-0.20-mapreduce-jobtracker - JobTracker for Hadoop
hadoop-0.20-mapreduce-jobtrackerha - High Availability JobTracker for
Hadoop
hadoop-0.20-mapreduce-tasktracker - Task Tracker for Hadoop
hadoop-0.20-mapreduce-zkfc - Hadoop MapReduce failover controller
hadoop-client - Hadoop client side dependencies
hadoop-conf-pseudo - Pseudo-distributed Hadoop configuration
hadoop-doc - Documentation for Hadoop
hadoop-hdfs - The Hadoop Distributed File System
hadoop-hdfs-datanode - Data Node for Hadoop
hadoop-hdfs-fuse - HDFS exposed over a Filesystem in Userspace
hadoop-hdfs-journalnode - Hadoop HDFS JournalNode
hadoop-hdfs-namenode - Name Node for Hadoop
hadoop-hdfs-secondarynamenode - Secondary Name Node for Hadoop
hadoop-hdfs-zkfc - Hadoop HDFS failover controller
hadoop-httpfs - HTTPFS for Hadoop
hadoop-mapreduce - The Hadoop MapReduce (MRv2)
hadoop-mapreduce-historyserver - MapReduce History Server
hadoop-yarn - The Hadoop NextGen MapReduce (YARN)
hadoop-yarn-nodemanager - Node manager for Hadoop
hadoop-yarn-proxyserver - Web proxy for YARN
hadoop-yarn-resourcemanager - Resource manager for Hadoop
hbase - HBase is the Hadoop database
hcatalog - Apache HCatalog is a table and storage management service.
hive - A data warehouse infrastructure built on top of Hadoop
hue-common - A browser-based desktop interface for Hadoop
hue-filebrowser - A UI for the Hadoop Distributed File System (HDFS)
hue-jobbrowser - A UI for viewing Hadoop map-reduce jobs
hue-jobsub - A UI for designing and submitting map-reduce jobs to Hadoop
hue-plugins - Plug-ins for Hadoop to enable integration with Hue
hue-shell - A shell for console based Hadoop applications
libhdfs0 - JNI Bindings to access Hadoop HDFS from C
mahout - A set of Java libraries for scalable machine learning.
oozie - A workflow and coordinator sytem for Hadoop jobs.
pig - A platform for analyzing large data sets using Hadoop
pig-udf-datafu - A collection of user-defined functions for Hadoop and
Pig.
sqoop - Tool for easy imports and exports of data sets between databases
and HDFS
sqoop2 - Tool for easy imports and exports of data sets between
databases and HDFS
webhcat - WEBHcat provides a REST-like web API for HCatalog and related
Hadoop components.
cdh4-repository - Cloudera's Distribution including Apache Hadoop

So it seems that MapReduce is installed but I don't see anything in
/etc/init.d to start it up. Ideas?

On Fri, Apr 26, 2013 at 10:37 AM, Rishi Yadav wrote:

  do you see "retired jobs" on job tracker page. There is also "job
tracker history" on the bottom of page. 

something like this  http://nn.zettabyte.com:50030/jobtracker.jsp
<http://nn.zettabyte.com:50030/jobtracker.jsp>
Thanks and Regards,
Rishi Yadav

On Fri, Apr 26, 2013 at 7:36 AM, < [EMAIL PROTECTED]
<javascript:parent.wgMail.openComposeWindow('[EMAIL PROTECTED]')>
> wrote:
When I submit a simple "Hello World" M/R job like WordCount it takes
less than 5 seconds. The texts show numerous methods for monitoring M/R
jobs as they are happening but I have yet to see any that show
statistics about a job after it has completed. Obviously simple jobs
that take a short amount of time don't allow time to fire up any web
mage or monitoring tool to see how it progresses through the JobTracker
and TaskTracker as well as which node it is processed on. Any
suggestions on how could see this kind of data *after* a job has
completed?