|
|
-
Re: Question about how to find which file takes the longest time to process and how to assign more mappers to process that particular fileHemanth Yamijala 2012-10-05, 04:21
Hi,
Roughly, this information will be available under the 'Hadoop map task list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is what you are using). You can reach this page by selecting the running tasks link from the job information page. The page has a table that lists all the tasks and under the status column tells you which part of the input is being processed. Please note that, depending on the input format chosen, a task may be processing a *part* of a file, and not necessary a file itself. Another good source of information to see why these particular tasks are slow will be to look at the job's counters. Again these counters can be accessed from the web ui of the task list page. It would help more if you can provide more information - like what job you're trying to run, the input format specified etc. Thanks hemanth On Fri, Oct 5, 2012 at 3:33 AM, Huanchen Zhang <[EMAIL PROTECTED]> wrote: > Hello, > > I have a question about how to find which file takes the longest time to > process and how to assign more mappers to process that particular file. > > Currently, about three mapper takes about five times more time to > complete. So, how can I detect which specific files are those three mapper > are processing? If above if doable, how can I assign more mappers to > process those specific files? > > Thank you ! > > Best, > Huanchen |