Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file


+
Huanchen Zhang 2012-10-04, 22:03
Copy link to this message
-
Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular file
Hemanth Yamijala 2012-10-05, 04:21
Hi,

Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all the
tasks and under the status column tells you which part of the input is
being processed. Please note that, depending on the input format chosen, a
task may be processing a *part* of a file, and not necessary a file itself.

Another good source of information to see why these particular tasks are
slow will be to look at the job's counters. Again these counters can be
accessed from the web ui of the task list page.

It would help more if you can provide more information - like what job
you're trying to run, the input format specified etc.

Thanks
hemanth

On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have a question about how to find which file takes the longest time to
> process and how to assign more mappers to process that particular file.
>
> Currently, about three mapper takes about five times more time to
> complete. So, how can I detect which specific files are those three mapper
> are processing? If above if doable, how can I assign more mappers to
> process those specific files?
>
> Thank you !
>
> Best,
> Huanchen