On Mon, Jul 30, 2012 at 11:47 AM, brisk <[EMAIL PROTECTED]> wrote:
> Does anybody know if there are some cases where the output/input ratio for
> map tasks is larger than 1? I can just think of for the sort, it's 1 and
> for the search job it's usually smaller than 1...
The traditional case is building an inverted index of some sort. Your input
is the input documents, the shuffle is the set of search terms and their
targets and the output is the final index. The shuffle is much larger than
either the input or output.