-Re: Applications creates bigger output than input?
Niels Basjes 2011-05-19, 12:57
Something I've seen in the past is code that has the input
So the output number of records is the same as the length of the input text.
2011/5/19 elton sky <[EMAIL PROTECTED]>:
> I pick up this topic again, because what I am looking for is something not
> CPU bound. Augmenting data for ETL and generating index are good examples.
> Neither of them requires too much cpu time on map side. The main bottle neck
> for them is shuffle and merge.
> Market basket analysis is cpu intensive in map phase, for sampling all
> possible combinations of items.
> I am still looking for more applications, which creates bigger output and
> not CPU bound.
> Any further idea? I appreciate.
> On Tue, May 3, 2011 at 3:10 AM, Steve Loughran <[EMAIL PROTECTED]> wrote:
>> On 30/04/2011 05:31, elton sky wrote:
>>> Thank you for suggestions:
>>> Weblog analysis, market basket analysis and generating search index.
>>> I guess for these applications we need more reduces than maps, for
>>> large intermediate output, isn't it. Besides, the input split for map
>>> be smaller than usual, because the workload for spill and merge on map's
>>> local disk is heavy.
>> any form of rendering can generate very large images
>> see: http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf
Met vriendelijke groeten,