Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Understanding of the hadoop distribution system (tuning)

Elaine Gan 2012-09-11, 01:56
Copy link to this message
Re: Understanding of the hadoop distribution system (tuning)
Hello Elaine,

You did not tell your cluster size. Number of nodes , cores in each node.

What sort of work you are doing , 6 hours for 518MB data is huge time.

The number of map tasks would be 518/64

So this many map tasks needs to run to process your data.

Now they can run on single node or multiple nodes depending on available
slots. Did you check job tracker page while execution is taking place ,
there you can see at which node its being processed. You can go to Running
tasks page.


Jagat Singh
On Tue, Sep 11, 2012 at 11:56 AM, Elaine Gan <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm new to hadoop and i've just played around with map reduce.
> I would like to check if my understanding to hadoop is correct and i
> would appreciate if anyone could correct me if i'm wrong.
> I have a data of around 518MB, and i wrote a MR program to process it.
> Here are some of my settings in my mapred-site.xml.
> ---------------------------------------------------------------
> mapred.tasktracker.map.tasks.maximum = 20
> mapred.tasktracker.reduce.tasks.maximum = 20
> ---------------------------------------------------------------
> My block size is default, 64MB
> With my data size = 518MB, i guess setting the maximum for MR task to 20
> is far more than enough (518/64 = 8) , did i get it correctly?
> When i run the MR program, i could see in the Map/Reduce Administration
> page that the number of Maps Total = 8, so i assume that everything is
> going well here, once again if i'm wrong please correct me.
> (Sometimes it shows only Maps Total = 3)
> There's one thing which i'm uncertain about hadoop distribution.
> Is the Maps Total = 8 means that there are 8 map tasks split among all
> the data nodes (task trackers)?
> Is there anyway i can checked whether all the tasks are shared among
> datanodes (where task trackers are working).
> When i clicked on each link under that Task Id, i can see there's "Input
> Split Locations" stated under each task details, if the inputs are
> splitted between data nodes, does that means that everything is working
> well?
> I need to make sure i got everything running well because my MR took
> around 6 hours to finish despite the input size is small.. (Well, i know
> hadoop is not meant for small data), I'm not sure whether it's my
> configuration that goes wrong or hadoop is just not suitable for my case.
> I'm actually running a mahout kmeans analysis.
> Thank you for your time.
Bejoy Ks 2012-09-11, 06:42