Sounds like Matt possesses the proper combination of expertise in both databases and MapReduce to assist you.  I'm bowing out as I honestly don't know advanced database concepts at all.  In addition, hive offers hive-specific tools like Matt suggested (map-side joins) to help out, which I'm too new too to speculate on.  I'm just starting hive this week as a matter of fact.

The short answer on MapReduce algorithms is that the individual computational units can't communicate with each other (each mapper or each map() in fact cannot communicate with the others, likewise for reducers).  That's one of the major distinctions between MapReduce and more general parallel processing frameworks like MPI.  This is the wrong mailing list to go much deeper than that however.

Thanks Matt.

Best of luck Mahsa.

On Mar 13, 2012, at 10:13 , Tucker, Matt wrote:

________________________________________________________________________________
Keith Wiley     [EMAIL PROTECTED]     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB