-Re: Applications creates bigger output than input?
Ted Dunning 2011-04-30, 20:32
Cooccurrence analysis is commonly used in recommendations. These produce
Come on over to the Mahout project if you would like to talk to a bunch of
people who work on these problems.
On Fri, Apr 29, 2011 at 9:31 PM, elton sky <[EMAIL PROTECTED]> wrote:
> Thank you for suggestions:
> Weblog analysis, market basket analysis and generating search index.
> I guess for these applications we need more reduces than maps, for handling
> large intermediate output, isn't it. Besides, the input split for map
> be smaller than usual, because the workload for spill and merge on map's
> local disk is heavy.
> On Sat, Apr 30, 2011 at 11:22 AM, Owen O'Malley <[EMAIL PROTECTED]>
> > On Fri, Apr 29, 2011 at 5:02 AM, elton sky <[EMAIL PROTECTED]>
> > > For my benchmark purpose, I am looking for some non-trivial, real life
> > > applications which creates *bigger* output than its input. Trivial
> > example
> > > I
> > > can think about is cross join...
> > >
> > As you say, almost all cross join jobs have that property. The other case
> > that almost always fits into that category is generating an index. For
> > example, if your input is a corpus of documents and you want to generate
> > the
> > list of documents that contain each word, the output (and especially the
> > shuffle data) is much larger than the input.
> > -- Owen