Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: output/input ratio > 1 for map tasks?

Copy link to this message
Re: output/input ratio > 1 for map tasks?
On Mon, Jul 30, 2012 at 11:47 AM, brisk <[EMAIL PROTECTED]> wrote:

> Hi,
> Does anybody know if there are some cases where the output/input ratio for
> map tasks is larger than 1? I can just think of for the sort, it's 1 and
> for the search job it's usually smaller than 1...

The traditional case is building an inverted index of some sort. Your input
is the input documents, the shuffle is the set of search terms and their
targets and the output is the final index. The shuffle is much larger than
either the input or output.

-- Owen