Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How to combine input files for a MapReduce job


+
Agarwal, Nikhil 2013-05-13, 07:20
Copy link to this message
-
Re: How to combine input files for a MapReduce job
Look into mapred.max.split.size mapred.min.split.size and number of mapper
in mapred-site.xml

*Thanks & Regards    *


Shashwat Shriparv

On Mon, May 13, 2013 at 12:50 PM, Agarwal, Nikhil <[EMAIL PROTECTED]
> wrote:

>  Hi,****
>
> ** **
>
> I  have a 3-node cluster, with JobTracker running on one machine and
> TaskTrackers on other two. Instead of using HDFS, I have written my own
> FileSystem implementation. As an experiment, I kept 1000 text files (all of
> same size) on both the slave nodes and ran a simple Wordcount MR job. It
> took around 50 mins to complete the task. Afterwards, I concatenated all
> the 1000 files into a single file and then ran a Wordcount MR job, it took
> 35 secs. From the JobTracker UI I could make out that the problem is
> because of the number of mappers that JobTracker is creating. For 1000
> files it creates 1000 maps and for 1 file it creates 1 map (irrespective of
> file size). ****
>
> ** **
>
> Thus, is there a way to reduce the number of mappers i.e. can I control
> the number of mappers through some configuration parameter so that Hadoop
> would club all the files until it reaches some specified size (say, 64 MB)
> and then make 1 map per 64 MB block?****
>
> ** **
>
> Also, I wanted to know how to see which file is being submitted to which
> TaskTracker or if that is not possible then how do I check if some data
> transfer is happening in between my slave nodes during a MR job?****
>
> ** **
>
> Sorry for so many questions and Thank you for your time.****
>
> ** **
>
> Regards,****
>
> Nikhil****
>
+
Harsh J 2013-05-13, 07:43
+
Harsh J 2013-05-13, 07:32
+
Agarwal, Nikhil 2013-05-13, 07:55
+
Harsh J 2013-05-13, 07:58