|
|
-
Re: output/input ratio > 1 for map tasks?
Niels Basjes 2012-07-30, 20:15
Hi,
On Mon, Jul 30, 2012 at 8:47 PM, brisk <[EMAIL PROTECTED]> wrote: > Does anybody know if there are some cases where the output/input ratio for > map tasks is larger than 1? I can just think of for the sort, it's 1 and for > the search job it's usually smaller than 1...
For a simple example: Have a look at the WordCount example.
Input of a single map call is 1 record: "This is a line" Output are 4 records: This 1 is 1 a 1 line 1
-- Best regards / Met vriendelijke groeten,
Niels Basjes
-
Re: output/input ratio > 1 for map tasks?
brisk 2012-07-30, 20:33
Thanks, Niels.
So do you mean in this case, I could expect the map output size (in terms of bytes) could be larger than the input size (e.g. by default 64MB)? I will also do a test later...
Best, Ethan
On Mon, Jul 30, 2012 at 1:15 PM, Niels Basjes <[EMAIL PROTECTED]> wrote:
> Hi, > > On Mon, Jul 30, 2012 at 8:47 PM, brisk <[EMAIL PROTECTED]> wrote: > > Does anybody know if there are some cases where the output/input ratio > for > > map tasks is larger than 1? I can just think of for the sort, it's 1 and > for > > the search job it's usually smaller than 1... > > For a simple example: Have a look at the WordCount example. > > Input of a single map call is 1 record: "This is a line" > Output are 4 records: > This 1 > is 1 > a 1 > line 1 > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
-
Re: output/input ratio > 1 for map tasks?
Owen O'Malley 2012-07-30, 21:57
On Mon, Jul 30, 2012 at 11:47 AM, brisk <[EMAIL PROTECTED]> wrote:
> Hi, > > Does anybody know if there are some cases where the output/input ratio for > map tasks is larger than 1? I can just think of for the sort, it's 1 and > for the search job it's usually smaller than 1... >
The traditional case is building an inverted index of some sort. Your input is the input documents, the shuffle is the set of search terms and their targets and the output is the final index. The shuffle is much larger than either the input or output.
-- Owen
|
|