Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - providing the same input to more than one Map task


Copy link to this message
-
Re: providing the same input to more than one Map task
Ted Dunning 2011-04-23, 01:33
I would recommend taking this question to the Mahout mailing list.

The short answer is that matrix multiplication by a column vector is pretty
easy.  Each mapper reads the vector in the configure method and then does a
dot product for each row of the input matrix.  Results are reassembled into
a vector in the reducer.

Mahout has special matrix structures to help with this.

On Fri, Apr 22, 2011 at 2:59 PM, Mehmet Tepedelenlioglu <
[EMAIL PROTECTED]> wrote:

> There is a way:
>
>
> http://hadoop.apache.org/common/docs/r0.18.3/mapred_tutorial.html#DistributedCache
>
> Are you working with a sparse matrix, or a full one?
>
>
> On Apr 22, 2011, at 2:33 PM, aanghelescu wrote:
>
> >
> > Hi all,
> >
> > I am trying to perform matrix-vector multiplication using Hadoop.
> >
> > So I have matrix M in a file, and vector v in another file. Obviously,
> files
> > are of different sizes. Is it possible to make it so that each Map task
> will
> > get the whole vector v and a chunk of matrix M? I know how my map and
> reduce
> > functions should look like, but I don't know how to format the input.
> >
> > Basically I want my map function to output key-value pairs
> (i,m[i,j]*v(j)),
> > where i is the row number, and j the column number; v(j) is the jth
> element
> > in v. And the reduce function will sum up all the values with the same
> key -
> > i, and that will be the ith element of my result vector.
> >
> > Or can you suggest another way to do it?
> >
> > Thanks,
> > Alexandra
> > --
> > View this message in context:
> http://old.nabble.com/providing-the-same-input-to-more-than-one-Map-task-tp31459012p31459012.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
>
>