Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> mapreduce on two tables


Copy link to this message
-
Re: mapreduce on two tables
Rohit,

It'll depend on what processing you want to do on all documents for a
given author. You could either write author -> {list of documents} to
an HDFS file and scan through that file using a MR job to do the
processing. Or you could simply output <author, document> as the
output of the map stage and do the processing on <author, {list of
documents}> in the reduce stage of the same job.

-ak

On Mon, Nov 7, 2011 at 3:02 AM, Rohit Kelkar <[EMAIL PROTECTED]> wrote:
> I needed some feedback about best way of implementing the following -
> In my document table I have documentid as row-id and content:author,
> content:text stored in each row. I want to process all documents
> pertaining to each author in a map reduce job. ie. my map will take
> key=author and values="all documentids sent by that sender". But for
> this first I would have to find all distinct authors and store them in
> another table. Then run map-reduce job on the second table. Am I
> thinking in the right direction or is there a better way to achieve
> this?
> - Rohit Kelkar
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB