


Matrix multiplication in Hadoop
Who is doing multiplication of large dense matrices using Hadoop? What is a good way to do that computation using Hadoop?
Thanks, Mike
+
Mike Spreitzer 20111118, 16:59

Re: Matrix multiplication in Hadoop
I'm not sure, but I would suspect that Mahout has some low level map/reduce jobs for this. You might start there. On Fri, Nov 18, 2011 at 8:59 AM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
> Who is doing multiplication of large dense matrices using Hadoop? What is > a good way to do that computation using Hadoop? > > Thanks, > Mike 
Thanks, John C
+
John Conwell 20111118, 17:02
+
Tom Peters 20111118, 17:17

Re: Matrix multiplication in Hadoop
Is Hadoop the best tool for doing large matrix math. Sure you can do it, but, aren't there better tools for these types of problems? Sent from a remote device. Please excuse any typos...
Mike Segel
On Nov 18, 2011, at 10:59 AM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
> Who is doing multiplication of large dense matrices using Hadoop? What is > a good way to do that computation using Hadoop? > > Thanks, > Mike
+
Michel Segel 20111118, 17:33

Re: Matrix multiplication in Hadoop
That's also an interesting question, but right now I am studying Hadoop and want to know how well dense MM can be done in Hadoop.
Thanks, Mike
From: Michel Segel <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Date: 11/18/2011 12:34 PM Subject: Re: Matrix multiplication in Hadoop
Is Hadoop the best tool for doing large matrix math. Sure you can do it, but, aren't there better tools for these types of problems? Sent from a remote device. Please excuse any typos...
Mike Segel
+
Mike Spreitzer 20111118, 17:39

RE: Matrix multiplication in Hadoop
Ok Mike,
First I admire that you are studying Hadoop.
To answer your question... not well.
Might I suggest that if you want to learn Hadoop, you try and find a problem which can easily be broken in to a series of parallel tasks where there is minimal communication requirements between each task?
No offense, but if I could make a parallel... what you're asking is akin to taking a normalized relational model and trying to run it as is in HBase. Yes it can be done. But not the best use of resources.
> To: [EMAIL PROTECTED] > CC: [EMAIL PROTECTED] > Subject: Re: Matrix multiplication in Hadoop > From: [EMAIL PROTECTED] > Date: Fri, 18 Nov 2011 12:39:00 0500 > > That's also an interesting question, but right now I am studying Hadoop > and want to know how well dense MM can be done in Hadoop. > > Thanks, > Mike > > > > From: Michel Segel <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: 11/18/2011 12:34 PM > Subject: Re: Matrix multiplication in Hadoop > > > > Is Hadoop the best tool for doing large matrix math. > Sure you can do it, but, aren't there better tools for these types of > problems? > > > Sent from a remote device. Please excuse any typos... > > Mike Segel >
+
Michael Segel 20111118, 18:48

RE: Matrix multiplication in Hadoop
Well, this mismatch may tell me something interesting about Hadoop. Matrix multiplication has a lot of inherent parallelism, so from very crude considerations it is not obvious that there should be a mismatch. Why is matrix multiplication illsuited for Hadoop?
BTW, I looked into the Mahout documentation some, and did not find matrix multiplication there. It might be hidden inside one of the advertised algorithms; I looked at the documentation for a few, but did not notice mention of MM.
Thanks, Mike
From: Michael Segel <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Date: 11/18/2011 01:49 PM Subject: RE: Matrix multiplication in Hadoop Ok Mike,
First I admire that you are studying Hadoop.
To answer your question... not well.
Might I suggest that if you want to learn Hadoop, you try and find a problem which can easily be broken in to a series of parallel tasks where there is minimal communication requirements between each task?
No offense, but if I could make a parallel... what you're asking is akin to taking a normalized relational model and trying to run it as is in HBase. Yes it can be done. But not the best use of resources.
> To: [EMAIL PROTECTED] > CC: [EMAIL PROTECTED] > Subject: Re: Matrix multiplication in Hadoop > From: [EMAIL PROTECTED] > Date: Fri, 18 Nov 2011 12:39:00 0500 > > That's also an interesting question, but right now I am studying Hadoop > and want to know how well dense MM can be done in Hadoop. > > Thanks, > Mike > > > > From: Michel Segel <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: 11/18/2011 12:34 PM > Subject: Re: Matrix multiplication in Hadoop > > > > Is Hadoop the best tool for doing large matrix math. > Sure you can do it, but, aren't there better tools for these types of > problems? > > > Sent from a remote device. Please excuse any typos... > > Mike Segel >
+
Mike Spreitzer 20111118, 19:52

Re: Matrix multiplication in Hadoop
On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> wrote: > Why is matrix multiplication illsuited for Hadoop?
IMHO, a huge issue here is the JVM's inability to fully support cpu vendor specific SIMD instructions and, by extension, optimized BLAS routines. Running a large MM task using intel's MKL rather than relying on generic compiler optimization is orders of magnitude faster on a single multicore processor. I see almost no way that Hadoop could win such a CPU intensive task against an mpi cluster with even a tenth of the nodes running with a decently tuned BLAS library. Racing even against a single CPU might be difficult, given the i/o overhead.
Still, it's a reasonably common problem and we shouldn't murder the good in favor of the best. I'm certain a MM/LinAlg Hadoop library with even mediocre performance, wrt C, would get used.
 Mike Davis
+
Mike Davis 20111119, 03:39

RE: Matrix multiplication in Hadoop
Perhaps this is a good candidate for a native library, then?
________________________________________ From: Mike Davis [[EMAIL PROTECTED]] Sent: Friday, November 18, 2011 7:39 PM To: [EMAIL PROTECTED] Subject: Re: Matrix multiplication in Hadoop
On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> wrote: > Why is matrix multiplication illsuited for Hadoop?
IMHO, a huge issue here is the JVM's inability to fully support cpu vendor specific SIMD instructions and, by extension, optimized BLAS routines. Running a large MM task using intel's MKL rather than relying on generic compiler optimization is orders of magnitude faster on a single multicore processor. I see almost no way that Hadoop could win such a CPU intensive task against an mpi cluster with even a tenth of the nodes running with a decently tuned BLAS library. Racing even against a single CPU might be difficult, given the i/o overhead.
Still, it's a reasonably common problem and we shouldn't murder the good in favor of the best. I'm certain a MM/LinAlg Hadoop library with even mediocre performance, wrt C, would get used.
 Mike Davis
The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return email and destroy all copies of the original message.
+
Tim Broberg 20111119, 16:34

Re: Matrix multiplication in Hadoop
Sounds like a job for next gen map reduce native libraries and gpu's. A modern day Dr frankenstein for sure.
On Saturday, November 19, 2011, Tim Broberg <[EMAIL PROTECTED]> wrote: > Perhaps this is a good candidate for a native library, then? > > ________________________________________ > From: Mike Davis [[EMAIL PROTECTED]] > Sent: Friday, November 18, 2011 7:39 PM > To: [EMAIL PROTECTED] > Subject: Re: Matrix multiplication in Hadoop > > On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> wrote: >> Why is matrix multiplication illsuited for Hadoop? > > IMHO, a huge issue here is the JVM's inability to fully support cpu vendor > specific SIMD instructions and, by extension, optimized BLAS routines. > Running a large MM task using intel's MKL rather than relying on generic > compiler optimization is orders of magnitude faster on a single multicore > processor. I see almost no way that Hadoop could win such a CPU intensive > task against an mpi cluster with even a tenth of the nodes running with a > decently tuned BLAS library. Racing even against a single CPU might be > difficult, given the i/o overhead. > > Still, it's a reasonably common problem and we shouldn't murder the good in > favor of the best. I'm certain a MM/LinAlg Hadoop library with even > mediocre performance, wrt C, would get used. > >  > Mike Davis > > The information and any attached documents contained in this message > may be confidential and/or legally privileged. The message is > intended solely for the addressee(s). If you are not the intended > recipient, you are hereby notified that any use, dissemination, or > reproduction is strictly prohibited and may be unlawful. If you are > not the intended recipient, please contact the sender immediately by > return email and destroy all copies of the original message. >
+
Edward Capriolo 20111119, 16:53

Re: Matrix multiplication in Hadoop
Right, I agree with Edward Capriolo, Hadoop + GPGPU is a better choice.
On Sat, Nov 19, 2011 at 10:53 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
> Sounds like a job for next gen map reduce native libraries and gpu's. A > modern day Dr frankenstein for sure. > > On Saturday, November 19, 2011, Tim Broberg <[EMAIL PROTECTED]> wrote: > > Perhaps this is a good candidate for a native library, then? > > > > ________________________________________ > > From: Mike Davis [[EMAIL PROTECTED]] > > Sent: Friday, November 18, 2011 7:39 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Matrix multiplication in Hadoop > > > > On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> > wrote: > >> Why is matrix multiplication illsuited for Hadoop? > > > > IMHO, a huge issue here is the JVM's inability to fully support cpu > vendor > > specific SIMD instructions and, by extension, optimized BLAS routines. > > Running a large MM task using intel's MKL rather than relying on generic > > compiler optimization is orders of magnitude faster on a single multicore > > processor. I see almost no way that Hadoop could win such a CPU intensive > > task against an mpi cluster with even a tenth of the nodes running with a > > decently tuned BLAS library. Racing even against a single CPU might be > > difficult, given the i/o overhead. > > > > Still, it's a reasonably common problem and we shouldn't murder the good > in > > favor of the best. I'm certain a MM/LinAlg Hadoop library with even > > mediocre performance, wrt C, would get used. > > > >  > > Mike Davis > > > > The information and any attached documents contained in this message > > may be confidential and/or legally privileged. The message is > > intended solely for the addressee(s). If you are not the intended > > recipient, you are hereby notified that any use, dissemination, or > > reproduction is strictly prohibited and may be unlawful. If you are > > not the intended recipient, please contact the sender immediately by > > return email and destroy all copies of the original message. > > >
+
He Chen 20111119, 17:04

Re: Matrix multiplication in Hadoop
You really don't need to wait...
If you're going to go down this path you can use a jni wrapper to do the c/c++ code for the gpu... You can do that now...
If you want to go beyond the 1D you can do it but you have to get a bit creative... but it's doable... Sent from a remote device. Please excuse any typos...
Mike Segel
On Nov 19, 2011, at 10:53 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> Sounds like a job for next gen map reduce native libraries and gpu's. A > modern day Dr frankenstein for sure. > > On Saturday, November 19, 2011, Tim Broberg <[EMAIL PROTECTED]> wrote: >> Perhaps this is a good candidate for a native library, then? >> >> ________________________________________ >> From: Mike Davis [[EMAIL PROTECTED]] >> Sent: Friday, November 18, 2011 7:39 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Matrix multiplication in Hadoop >> >> On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> wrote: >>> Why is matrix multiplication illsuited for Hadoop? >> >> IMHO, a huge issue here is the JVM's inability to fully support cpu vendor >> specific SIMD instructions and, by extension, optimized BLAS routines. >> Running a large MM task using intel's MKL rather than relying on generic >> compiler optimization is orders of magnitude faster on a single multicore >> processor. I see almost no way that Hadoop could win such a CPU intensive >> task against an mpi cluster with even a tenth of the nodes running with a >> decently tuned BLAS library. Racing even against a single CPU might be >> difficult, given the i/o overhead. >> >> Still, it's a reasonably common problem and we shouldn't murder the good > in >> favor of the best. I'm certain a MM/LinAlg Hadoop library with even >> mediocre performance, wrt C, would get used. >> >>  >> Mike Davis >> >> The information and any attached documents contained in this message >> may be confidential and/or legally privileged. The message is >> intended solely for the addressee(s). If you are not the intended >> recipient, you are hereby notified that any use, dissemination, or >> reproduction is strictly prohibited and may be unlawful. If you are >> not the intended recipient, please contact the sender immediately by >> return email and destroy all copies of the original message. >>
+
Michel Segel 20111119, 17:33

Re: Matrix multiplication in Hadoop
Did you try Hama?
There are may methods.
1) use Hadoop MPI which allows you use MPI MM code based on Hadoop;
2) Hama is designed for MM
3) Use pure Hadoop Java MapReduce;
I did this before but may not be optimal algorithm. Put your first matrix in DistributedCache and take second matrix line as inputsplit. For each line, use a mapper to let a array multply the first matrix in DistributedCache. Use reducer to collect the result matrix. This algorithm is limited by your DistributedCache size. It is suitable for a small matrix to multiply a huge matrix.
Chen On Sat, Nov 19, 2011 at 10:34 AM, Tim Broberg <[EMAIL PROTECTED]> wrote:
> Perhaps this is a good candidate for a native library, then? > > ________________________________________ > From: Mike Davis [[EMAIL PROTECTED]] > Sent: Friday, November 18, 2011 7:39 PM > To: [EMAIL PROTECTED] > Subject: Re: Matrix multiplication in Hadoop > > On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> wrote: > > Why is matrix multiplication illsuited for Hadoop? > > IMHO, a huge issue here is the JVM's inability to fully support cpu vendor > specific SIMD instructions and, by extension, optimized BLAS routines. > Running a large MM task using intel's MKL rather than relying on generic > compiler optimization is orders of magnitude faster on a single multicore > processor. I see almost no way that Hadoop could win such a CPU intensive > task against an mpi cluster with even a tenth of the nodes running with a > decently tuned BLAS library. Racing even against a single CPU might be > difficult, given the i/o overhead. > > Still, it's a reasonably common problem and we shouldn't murder the good in > favor of the best. I'm certain a MM/LinAlg Hadoop library with even > mediocre performance, wrt C, would get used. > >  > Mike Davis > > The information and any attached documents contained in this message > may be confidential and/or legally privileged. The message is > intended solely for the addressee(s). If you are not the intended > recipient, you are hereby notified that any use, dissemination, or > reproduction is strictly prohibited and may be unlawful. If you are > not the intended recipient, please contact the sender immediately by > return email and destroy all copies of the original message. >
+
He Chen 20111119, 17:02

Re: Matrix multiplication in Hadoop
I agree Hama (and BSP model) could be a good option, plus Hama also supports MR nextgen now [1]. I know MM has been implemented with Hama in the past so it may be worth asking on the mailing list. My 2 cents, Tommaso [1] : http://svn.apache.org/repos/asf/incubator/hama/trunk/yarn/2011/11/19 He Chen <[EMAIL PROTECTED]> > Did you try Hama? > > There are may methods. > > 1) use Hadoop MPI which allows you use MPI MM code based on Hadoop; > > 2) Hama is designed for MM > > 3) Use pure Hadoop Java MapReduce; > > I did this before but may not be optimal algorithm. Put your first matrix > in DistributedCache and take second matrix line as inputsplit. For each > line, use a mapper to let a array multply the first matrix in > DistributedCache. Use reducer to collect the result matrix. This algorithm > is limited by your DistributedCache size. It is suitable for a small matrix > to multiply a huge matrix. > > Chen > On Sat, Nov 19, 2011 at 10:34 AM, Tim Broberg <[EMAIL PROTECTED]> > wrote: > > > Perhaps this is a good candidate for a native library, then? > > > > ________________________________________ > > From: Mike Davis [[EMAIL PROTECTED]] > > Sent: Friday, November 18, 2011 7:39 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Matrix multiplication in Hadoop > > > > On Friday, November 18, 2011, Mike Spreitzer <[EMAIL PROTECTED]> > wrote: > > > Why is matrix multiplication illsuited for Hadoop? > > > > IMHO, a huge issue here is the JVM's inability to fully support cpu > vendor > > specific SIMD instructions and, by extension, optimized BLAS routines. > > Running a large MM task using intel's MKL rather than relying on generic > > compiler optimization is orders of magnitude faster on a single multicore > > processor. I see almost no way that Hadoop could win such a CPU intensive > > task against an mpi cluster with even a tenth of the nodes running with a > > decently tuned BLAS library. Racing even against a single CPU might be > > difficult, given the i/o overhead. > > > > Still, it's a reasonably common problem and we shouldn't murder the good > in > > favor of the best. I'm certain a MM/LinAlg Hadoop library with even > > mediocre performance, wrt C, would get used. > > > >  > > Mike Davis > > > > The information and any attached documents contained in this message > > may be confidential and/or legally privileged. The message is > > intended solely for the addressee(s). If you are not the intended > > recipient, you are hereby notified that any use, dissemination, or > > reproduction is strictly prohibited and may be unlawful. If you are > > not the intended recipient, please contact the sender immediately by > > return email and destroy all copies of the original message. > > >
+
Tommaso Teofili 20111119, 17:28

Re: Matrix multiplication in Hadoop
Hey Mike In mahout one place where matrix multiplication is used is in Collaborative Filtering distributed implementation. The recommendations here are generated by the multiplication of a cooccurence matrix with a user vector. This user vector is treated as a single column matrix and then the matrix multiplication takes place in there.
Regards Bejoy K S
Original Message From: Mike Spreitzer <[EMAIL PROTECTED]> Date: Fri, 18 Nov 2011 14:52:05 To: <[EMAIL PROTECTED]> ReplyTo: [EMAIL PROTECTED] Subject: RE: Matrix multiplication in Hadoop
Well, this mismatch may tell me something interesting about Hadoop. Matrix multiplication has a lot of inherent parallelism, so from very crude considerations it is not obvious that there should be a mismatch. Why is matrix multiplication illsuited for Hadoop?
BTW, I looked into the Mahout documentation some, and did not find matrix multiplication there. It might be hidden inside one of the advertised algorithms; I looked at the documentation for a few, but did not notice mention of MM.
Thanks, Mike
From: Michael Segel <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Date: 11/18/2011 01:49 PM Subject: RE: Matrix multiplication in Hadoop Ok Mike,
First I admire that you are studying Hadoop.
To answer your question... not well.
Might I suggest that if you want to learn Hadoop, you try and find a problem which can easily be broken in to a series of parallel tasks where there is minimal communication requirements between each task?
No offense, but if I could make a parallel... what you're asking is akin to taking a normalized relational model and trying to run it as is in HBase. Yes it can be done. But not the best use of resources.
> To: [EMAIL PROTECTED] > CC: [EMAIL PROTECTED] > Subject: Re: Matrix multiplication in Hadoop > From: [EMAIL PROTECTED] > Date: Fri, 18 Nov 2011 12:39:00 0500 > > That's also an interesting question, but right now I am studying Hadoop > and want to know how well dense MM can be done in Hadoop. > > Thanks, > Mike > > > > From: Michel Segel <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: 11/18/2011 12:34 PM > Subject: Re: Matrix multiplication in Hadoop > > > > Is Hadoop the best tool for doing large matrix math. > Sure you can do it, but, aren't there better tools for these types of > problems? > > > Sent from a remote device. Please excuse any typos... > > Mike Segel >
+
bejoy.hadoop@... 20111119, 22:17

Re: Matrix multiplication in Hadoop
Hi, there are two solutions suggested that take advantage of either (a) a vector x matrix (your CF / Mahout example ) or (b) a small matrix x large matrix (an earlier suggestion of putting the small matrix into the Distributed Cache). Not clear yet on good approaches of (c) large matrix x large matrix. 2011/11/19 <[EMAIL PROTECTED]>
> Hey Mike > In mahout one place where matrix multiplication is used is in > Collaborative Filtering distributed implementation. The recommendations > here are generated by the multiplication of a cooccurence matrix with a > user vector. This user vector is treated as a single column matrix and then > the matrix multiplication takes place in there. > > Regards > Bejoy K S > > Original Message > From: Mike Spreitzer <[EMAIL PROTECTED]> > Date: Fri, 18 Nov 2011 14:52:05 > To: <[EMAIL PROTECTED]> > ReplyTo: [EMAIL PROTECTED] > Subject: RE: Matrix multiplication in Hadoop > > Well, this mismatch may tell me something interesting about Hadoop. Matrix > multiplication has a lot of inherent parallelism, so from very crude > considerations it is not obvious that there should be a mismatch. Why is > matrix multiplication illsuited for Hadoop? > > BTW, I looked into the Mahout documentation some, and did not find matrix > multiplication there. It might be hidden inside one of the advertised > algorithms; I looked at the documentation for a few, but did not notice > mention of MM. > > Thanks, > Mike > > > > From: Michael Segel <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Date: 11/18/2011 01:49 PM > Subject: RE: Matrix multiplication in Hadoop > > > > > Ok Mike, > > First I admire that you are studying Hadoop. > > To answer your question... not well. > > Might I suggest that if you want to learn Hadoop, you try and find a > problem which can easily be broken in to a series of parallel tasks where > there is minimal communication requirements between each task? > > No offense, but if I could make a parallel... what you're asking is akin > to taking a normalized relational model and trying to run it as is in > HBase. > Yes it can be done. But not the best use of resources. > > > To: [EMAIL PROTECTED] > > CC: [EMAIL PROTECTED] > > Subject: Re: Matrix multiplication in Hadoop > > From: [EMAIL PROTECTED] > > Date: Fri, 18 Nov 2011 12:39:00 0500 > > > > That's also an interesting question, but right now I am studying Hadoop > > and want to know how well dense MM can be done in Hadoop. > > > > Thanks, > > Mike > > > > > > > > From: Michel Segel <[EMAIL PROTECTED]> > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > Date: 11/18/2011 12:34 PM > > Subject: Re: Matrix multiplication in Hadoop > > > > > > > > Is Hadoop the best tool for doing large matrix math. > > Sure you can do it, but, aren't there better tools for these types of > > problems? > > > > > > Sent from a remote device. Please excuse any typos... > > > > Mike Segel > > > > >
+
Stephen Boesch 20111119, 23:07

Re: Matrix multiplication in Hadoop
Look for uses of the DistributedRowMatrix in the Mahout code. The existing Mahout jobs are generally endtoend algorithm implementations which do things like matrix multiplication in the middle. Also, the Mahout algorithms generally prefer to use sparse data for distributed work.
What is a "large" matrix? You may find that you really don't need to go to the effort of using Hadoop.
Lance
On Sat, Nov 19, 2011 at 3:07 PM, Stephen Boesch <[EMAIL PROTECTED]> wrote:
> Hi, > there are two solutions suggested that take advantage of either (a) a > vector x matrix (your CF / Mahout example ) or (b) a small matrix x large > matrix (an earlier suggestion of putting the small matrix into the > Distributed Cache). Not clear yet on good approaches of (c) large matrix > x large matrix. > > > 2011/11/19 <[EMAIL PROTECTED]> > > > Hey Mike > > In mahout one place where matrix multiplication is used is in > > Collaborative Filtering distributed implementation. The recommendations > > here are generated by the multiplication of a cooccurence matrix with a > > user vector. This user vector is treated as a single column matrix and > then > > the matrix multiplication takes place in there. > > > > Regards > > Bejoy K S > > > > Original Message > > From: Mike Spreitzer <[EMAIL PROTECTED]> > > Date: Fri, 18 Nov 2011 14:52:05 > > To: <[EMAIL PROTECTED]> > > ReplyTo: [EMAIL PROTECTED] > > Subject: RE: Matrix multiplication in Hadoop > > > > Well, this mismatch may tell me something interesting about Hadoop. > Matrix > > multiplication has a lot of inherent parallelism, so from very crude > > considerations it is not obvious that there should be a mismatch. Why is > > matrix multiplication illsuited for Hadoop? > > > > BTW, I looked into the Mahout documentation some, and did not find matrix > > multiplication there. It might be hidden inside one of the advertised > > algorithms; I looked at the documentation for a few, but did not notice > > mention of MM. > > > > Thanks, > > Mike > > > > > > > > From: Michael Segel <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Date: 11/18/2011 01:49 PM > > Subject: RE: Matrix multiplication in Hadoop > > > > > > > > > > Ok Mike, > > > > First I admire that you are studying Hadoop. > > > > To answer your question... not well. > > > > Might I suggest that if you want to learn Hadoop, you try and find a > > problem which can easily be broken in to a series of parallel tasks where > > there is minimal communication requirements between each task? > > > > No offense, but if I could make a parallel... what you're asking is akin > > to taking a normalized relational model and trying to run it as is in > > HBase. > > Yes it can be done. But not the best use of resources. > > > > > To: [EMAIL PROTECTED] > > > CC: [EMAIL PROTECTED] > > > Subject: Re: Matrix multiplication in Hadoop > > > From: [EMAIL PROTECTED] > > > Date: Fri, 18 Nov 2011 12:39:00 0500 > > > > > > That's also an interesting question, but right now I am studying Hadoop > > > and want to know how well dense MM can be done in Hadoop. > > > > > > Thanks, > > > Mike > > > > > > > > > > > > From: Michel Segel <[EMAIL PROTECTED]> > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED] > > > > > Date: 11/18/2011 12:34 PM > > > Subject: Re: Matrix multiplication in Hadoop > > > > > > > > > > > > Is Hadoop the best tool for doing large matrix math. > > > Sure you can do it, but, aren't there better tools for these types of > > > problems? > > > > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > > > > > >
 Lance Norskog [EMAIL PROTECTED]
+
Lance Norskog 20111119, 23:27

Re: Matrix multiplication in Hadoop
I am looking at large dense matrix multiplication as an example problem for a class of middleware. I am also interested in sparse matrices, but am taking things one step at a time.
There is a paper in IEEE CloudCom '10 about Hama, including a matrix multiplication technique. It is essentially the same as what is called "technique 4" in the 2009 monograph by John Norstad cited early in this thread. Which means that, despite the fact that Hama touts the virtues of BSP (a position with which I am very sympathetic), this technique doesn't really take advantage of the extra features that BSP has over MapReduce. Note also that this technique creates intermediate data of much greater volume than the input. For example, if each matrix is stored as an NxN grid of blocks, the intermediate data (the blocks paired up, awaiting multiplication) is a factor of N larger than the input. I have heard people saying that N may be rather larger than sqrt(number of machines) because in some circumstances N has to be chosen before the number of available machines is known and you want to be able to divide the NxN load among your machines rather evenly. Even if N is like sqrt(number of machines) this is still an unwelcome amount of bloat. In comparison, the SUMMA technique does matrix multiplication but its intermediate data volume is no greater than the input.
Thanks, Mike
+
Mike Spreitzer 20111123, 05:20

Re: Matrix multiplication in Hadoop
Team , i could not able to read sequencial file which cluster gave . Please help . Problem: Sequencial file is returning null Thanks and Regards, S SYED ABDUL KATHER 9731841519 On Wed, Nov 23, 2011 at 10:52 AM, Mike Spreitzer [via Lucene] < mlnode+[EMAIL PROTECTED]> wrote: > I am looking at large dense matrix multiplication as an example problem > for a class of middleware. I am also interested in sparse matrices, but > am taking things one step at a time. > > There is a paper in IEEE CloudCom '10 about Hama, including a matrix > multiplication technique. It is essentially the same as what is called > "technique 4" in the 2009 monograph by John Norstad cited early in this > thread. Which means that, despite the fact that Hama touts the virtues of > BSP (a position with which I am very sympathetic), this technique doesn't > really take advantage of the extra features that BSP has over MapReduce. > Note also that this technique creates intermediate data of much greater > volume than the input. For example, if each matrix is stored as an NxN > grid of blocks, the intermediate data (the blocks paired up, awaiting > multiplication) is a factor of N larger than the input. I have heard > people saying that N may be rather larger than sqrt(number of machines) > because in some circumstances N has to be chosen before the number of > available machines is known and you want to be able to divide the NxN load > among your machines rather evenly. Even if N is like sqrt(number of > machines) this is still an unwelcome amount of bloat. In comparison, the > SUMMA technique does matrix multiplication but its intermediate data > volume is no greater than the input. > > Thanks, > Mike > >  > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/MatrixmultiplicationinHadooptp3519089p3529843.html> To unsubscribe from Lucene, click here< http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472066&code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw>> . > NAML< http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespacenabble.view.web.template.NabbleNamespacenabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.namlinstant_emails%21nabble%3Aemail.namlsend_instant_email%21nabble%3Aemail.naml>>  THANKS AND REGARDS, SYED ABDUL KATHER  View this message in context: http://lucene.472066.n3.nabble.com/MatrixmultiplicationinHadooptp3519089p3534254.htmlSent from the Hadoop luceneusers mailing list archive at Nabble.com.
+
in.abdul 20111124, 17:38

Re: Matrix multiplication in Hadoop
I'd really be interested in a comparison of Numpy/Octave/Matlab kind of tools with a Hadoop (lets say 410 large cloud servers) implementation with growing size of the matrix. I want to know the scale at which Hadoop really starts to pull away. Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions.
________________________________ From: Michel Segel <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, November 18, 2011 9:33 AM Subject: Re: Matrix multiplication in Hadoop
Is Hadoop the best tool for doing large matrix math. Sure you can do it, but, aren't there better tools for these types of problems? Sent from a remote device. Please excuse any typos...
Mike Segel
On Nov 18, 2011, at 10:59 AM, Mike Spreitzer <[EMAIL PROTECTED]> wrote:
> Who is doing multiplication of large dense matrices using Hadoop? What is > a good way to do that computation using Hadoop? > > Thanks, > Mike
+
Ayon Sinha 20111118, 18:48

Re: Matrix multiplication in Hadoop
A problem with matrix multiplication in hadoop is that hadoop is row oriented for the most part. I have thought about this use case however and you can theoretically turn a 2D matrix into a 1D matrix and then that fits into the row oriented nature of hadoop. Also being that the typical mapper can have fairly large chunks of memory like 1024MB I have done work like this before were I loaded such datasets into memory to process them. That usage does not really fit the map reduce model. I have been wanting to look at: http://www.scidb.org/Edward On Fri, Nov 18, 2011 at 1:48 PM, Ayon Sinha <[EMAIL PROTECTED]> wrote: > I'd really be interested in a comparison of Numpy/Octave/Matlab kind of > tools with a Hadoop (lets say 410 large cloud servers) implementation with > growing size of the matrix. I want to know the scale at which Hadoop really > starts to pull away. > > Ayon > See My Photos on Flickr > Also check out my Blog for answers to commonly asked questions. > > > > ________________________________ > From: Michel Segel <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Sent: Friday, November 18, 2011 9:33 AM > Subject: Re: Matrix multiplication in Hadoop > > Is Hadoop the best tool for doing large matrix math. > Sure you can do it, but, aren't there better tools for these types of > problems? > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Nov 18, 2011, at 10:59 AM, Mike Spreitzer <[EMAIL PROTECTED]> wrote: > > > Who is doing multiplication of large dense matrices using Hadoop? What > is > > a good way to do that computation using Hadoop? > > > > Thanks, > > Mike
+
Edward Capriolo 20111118, 19:46

