Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> how can i increase the number of mappers?


+
Jane Wayne 2012-03-21, 06:07
+
Anil Gupta 2012-03-21, 06:37
Copy link to this message
-
Re: how can i increase the number of mappers?
as i understand, that class does not exist for new API in hadoop v0.20.2
(which is what i am using). if i am mistaken, where is it?

i am looking at hadoop v1.0.1, and there is a NLineInputFormat class. i
wonder if i can simply copy/paste this into my project.

On Wed, Mar 21, 2012 at 2:37 AM, Anil Gupta <[EMAIL PROTECTED]> wrote:

> Have a look at NLineInputFormat class in Hadoop. That class will solve
> your purpose.
>
> Best Regards,
> Anil
>
> On Mar 20, 2012, at 11:07 PM, Jane Wayne <[EMAIL PROTECTED]> wrote:
>
> > i have a matrix that i am performing operations on. it is 10,000 rows by
> > 5,000 columns. the total size of the file is just under 30 MB. my HDFS
> > block size is set to 64 MB. from what i understand, the number of mappers
> > is roughly equal to the number of HDFS blocks used in the input. i.e. if
> my
> > input data spans 1 block, then only 1 mapper is created, if my data
> spans 2
> > blocks, then 2 mappers will be created, etc...
> >
> > so, with my 1 matrix file of 15 MB, this won't fill up a block of data,
> and
> > being as such, only 1 mapper will be called upon the data. is this
> > understanding correct?
> >
> > if so, what i want to happen is for more than one mapper (let's say 10)
> to
> > work on the data, even though it remains on 1 block. my analysis (or
> > map/reduce job) is such that +1 mappers can work on different parts of
> the
> > matrix. for example, mapper 1 can work on the first 500 rows, mapper 2
> can
> > work on the next 500 rows, etc... how can i set up multiple mappers (+1
> > mapper) to work on a file that resides only one block (or a file whose
> size
> > is smaller than the HDFS block size).
> >
> > can i split the matrix into (let's say) 10 files? that will mean 30 MB /
> 10
> > = 3 MB per file. then put each 3 MB file onto HDFS ? will this increase
> the
> > chance of having multiple mappers work simultaneously on the data/matrix?
> > if i can increase the number of mappers, i think (pretty sure) my
> > implementation will improve in speed linearly.
> >
> > any help is appreciated.
>
+
Jane Wayne 2012-03-21, 16:10
+
Wei Shung Chung 2012-03-21, 17:12