Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> what does "keep 10% map, 40% reduce" mean in gridmix2's README?


Copy link to this message
-
Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README?
Hi, Chen,

Thank you for your reply,

but in its README, there is no value which is larger than 100%, it means
that the size of intermediate results will never be larger than input size,

it will not be the case, because the input data is compressed, the size of
the generated data will expand to be very large....

it's just my guessing, can anyone correct me?

Best,

Nan
On Thu, Jun 14, 2012 at 11:50 PM, Chen He <[EMAIL PROTECTED]> wrote:

> Hi Nan
>
> probably the map stage will output 10% of the total input, and the reduce
> stage will output 40% of intermediate results (10% of total input).
>
> For example, 500GB input, after the map stage, it will be 50GB and it will
> become 20GB after the reduce stage.
>
> It may be similar to the loadgen in hadoop test example.
>
> Anyone has suggestion?
>
> Chen
> System Architect Intern @ ZData
> PhD student@CSE Dept.
>
>
> On Thu, Jun 14, 2012 at 1:58 AM, Nan Zhu <[EMAIL PROTECTED]> wrote:
>
> > Hi, all
> >
> > I'm using gridmix2 to test my cluster, while in its README file, there
> are
> > statements like the following:
> >
> > +1) Three stage map/reduce job
> > +          Input:      500GB compressed (2TB uncompressed) SequenceFile
> > +                 (k,v) = (5 words, 100 words)
> > +                 hadoop-env: FIXCOMPSEQ
> > +     *Compute1:   keep 10% map, 40% reduce
> > +          Compute2:   keep 100% map, 77% reduce
> > +                 Input from Compute1
> > +     Compute3:   keep 116% map, 91% reduce
> > +                 Input from Compute2
> > +     *Motivation: Many user workloads are implemented as pipelined
> > map/reduce
> > +                 jobs, including Pig workloads
> >
> >
> > Can anyone tell me what does "keep 10% map, 40% reduce" mean here?
> >
> > Best,
> >
> > --
> > Nan Zhu
> > School of Electronic, Information and Electrical Engineering,229
> > Shanghai Jiao Tong University
> > 800,Dongchuan Road,Shanghai,China
> > E-Mail: [EMAIL PROTECTED]
> >
>

--
Nan Zhu
School of Electronic, Information and Electrical Engineering,229
Shanghai Jiao Tong University
800,Dongchuan Road,Shanghai,China
E-Mail: [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB