Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> providing the same input to more than one Map task


Copy link to this message
-
Re: providing the same input to more than one Map task
I would recommend taking this question to the Mahout mailing list.

The short answer is that matrix multiplication by a column vector is pretty
easy.  Each mapper reads the vector in the configure method and then does a
dot product for each row of the input matrix.  Results are reassembled into
a vector in the reducer.

Mahout has special matrix structures to help with this.

On Fri, Apr 22, 2011 at 2:59 PM, Mehmet Tepedelenlioglu <
[EMAIL PROTECTED]> wrote:

> There is a way:
>
>
> http://hadoop.apache.org/common/docs/r0.18.3/mapred_tutorial.html#DistributedCache
>
> Are you working with a sparse matrix, or a full one?
>
>
> On Apr 22, 2011, at 2:33 PM, aanghelescu wrote:
>
> >
> > Hi all,
> >
> > I am trying to perform matrix-vector multiplication using Hadoop.
> >
> > So I have matrix M in a file, and vector v in another file. Obviously,
> files
> > are of different sizes. Is it possible to make it so that each Map task
> will
> > get the whole vector v and a chunk of matrix M? I know how my map and
> reduce
> > functions should look like, but I don't know how to format the input.
> >
> > Basically I want my map function to output key-value pairs
> (i,m[i,j]*v(j)),
> > where i is the row number, and j the column number; v(j) is the jth
> element
> > in v. And the reduce function will sum up all the values with the same
> key -
> > i, and that will be the ith element of my result vector.
> >
> > Or can you suggest another way to do it?
> >
> > Thanks,
> > Alexandra
> > --
> > View this message in context:
> http://old.nabble.com/providing-the-same-input-to-more-than-one-Map-task-tp31459012p31459012.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB