Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> how can i increase the number of mappers?


Copy link to this message
-
Re: how can i increase the number of mappers?
as i understand, that class does not exist for new API in hadoop v0.20.2
(which is what i am using). if i am mistaken, where is it?

i am looking at hadoop v1.0.1, and there is a NLineInputFormat class. i
wonder if i can simply copy/paste this into my project.

On Wed, Mar 21, 2012 at 2:37 AM, Anil Gupta <[EMAIL PROTECTED]> wrote:

> Have a look at NLineInputFormat class in Hadoop. That class will solve
> your purpose.
>
> Best Regards,
> Anil
>
> On Mar 20, 2012, at 11:07 PM, Jane Wayne <[EMAIL PROTECTED]> wrote:
>
> > i have a matrix that i am performing operations on. it is 10,000 rows by
> > 5,000 columns. the total size of the file is just under 30 MB. my HDFS
> > block size is set to 64 MB. from what i understand, the number of mappers
> > is roughly equal to the number of HDFS blocks used in the input. i.e. if
> my
> > input data spans 1 block, then only 1 mapper is created, if my data
> spans 2
> > blocks, then 2 mappers will be created, etc...
> >
> > so, with my 1 matrix file of 15 MB, this won't fill up a block of data,
> and
> > being as such, only 1 mapper will be called upon the data. is this
> > understanding correct?
> >
> > if so, what i want to happen is for more than one mapper (let's say 10)
> to
> > work on the data, even though it remains on 1 block. my analysis (or
> > map/reduce job) is such that +1 mappers can work on different parts of
> the
> > matrix. for example, mapper 1 can work on the first 500 rows, mapper 2
> can
> > work on the next 500 rows, etc... how can i set up multiple mappers (+1
> > mapper) to work on a file that resides only one block (or a file whose
> size
> > is smaller than the HDFS block size).
> >
> > can i split the matrix into (let's say) 10 files? that will mean 30 MB /
> 10
> > = 3 MB per file. then put each 3 MB file onto HDFS ? will this increase
> the
> > chance of having multiple mappers work simultaneously on the data/matrix?
> > if i can increase the number of mappers, i think (pretty sure) my
> > implementation will improve in speed linearly.
> >
> > any help is appreciated.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB