Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> mapreduce on two tables


Copy link to this message
-
Re: mapreduce on two tables
Rohit,

It'll depend on what processing you want to do on all documents for a
given author. You could either write author -> {list of documents} to
an HDFS file and scan through that file using a MR job to do the
processing. Or you could simply output <author, document> as the
output of the map stage and do the processing on <author, {list of
documents}> in the reduce stage of the same job.

-ak

On Mon, Nov 7, 2011 at 3:02 AM, Rohit Kelkar <[EMAIL PROTECTED]> wrote:
> I needed some feedback about best way of implementing the following -
> In my document table I have documentid as row-id and content:author,
> content:text stored in each row. I want to process all documents
> pertaining to each author in a map reduce job. ie. my map will take
> key=author and values="all documentids sent by that sender". But for
> this first I would have to find all distinct authors and store them in
> another table. Then run map-reduce job on the second table. Am I
> thinking in the right direction or is there a better way to achieve
> this?
> - Rohit Kelkar
>