Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> what does "keep 10% map, 40% reduce" mean in gridmix2's README?


+
Nan Zhu 2012-06-14, 06:58
+
Chen He 2012-06-14, 15:50
+
Nan Zhu 2012-06-14, 16:42
+
Chen He 2012-06-14, 17:09
+
gemini alex 2012-06-25, 06:50
Copy link to this message
-
Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README?
yes, what's the relationship between them?

On Mon, Jun 25, 2012 at 2:50 PM, gemini alex <[EMAIL PROTECTED]>wrote:

> did you configure map output compression ?
>
>
> 2012/6/15 Chen He <[EMAIL PROTECTED]>
>
> > Let me know when you get the correct answer.
> >
> > Chen
> >
> > On Thu, Jun 14, 2012 at 11:42 AM, Nan Zhu <[EMAIL PROTECTED]> wrote:
> >
> > > Hi, Chen,
> > >
> > > Thank you for your reply,
> > >
> > > but in its README, there is no value which is larger than 100%, it
> means
> > > that the size of intermediate results will never be larger than input
> > size,
> > >
> > > it will not be the case, because the input data is compressed, the size
> > of
> > > the generated data will expand to be very large....
> > >
> > > it's just my guessing, can anyone correct me?
> > >
> > > Best,
> > >
> > > Nan
> > >
> > >
> > > On Thu, Jun 14, 2012 at 11:50 PM, Chen He <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi Nan
> > > >
> > > > probably the map stage will output 10% of the total input, and the
> > reduce
> > > > stage will output 40% of intermediate results (10% of total input).
> > > >
> > > > For example, 500GB input, after the map stage, it will be 50GB and it
> > > will
> > > > become 20GB after the reduce stage.
> > > >
> > > > It may be similar to the loadgen in hadoop test example.
> > > >
> > > > Anyone has suggestion?
> > > >
> > > > Chen
> > > > System Architect Intern @ ZData
> > > > PhD student@CSE Dept.
> > > >
> > > >
> > > > On Thu, Jun 14, 2012 at 1:58 AM, Nan Zhu <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > Hi, all
> > > > >
> > > > > I'm using gridmix2 to test my cluster, while in its README file,
> > there
> > > > are
> > > > > statements like the following:
> > > > >
> > > > > +1) Three stage map/reduce job
> > > > > +          Input:      500GB compressed (2TB uncompressed)
> > SequenceFile
> > > > > +                 (k,v) = (5 words, 100 words)
> > > > > +                 hadoop-env: FIXCOMPSEQ
> > > > > +     *Compute1:   keep 10% map, 40% reduce
> > > > > +          Compute2:   keep 100% map, 77% reduce
> > > > > +                 Input from Compute1
> > > > > +     Compute3:   keep 116% map, 91% reduce
> > > > > +                 Input from Compute2
> > > > > +     *Motivation: Many user workloads are implemented as pipelined
> > > > > map/reduce
> > > > > +                 jobs, including Pig workloads
> > > > >
> > > > >
> > > > > Can anyone tell me what does "keep 10% map, 40% reduce" mean here?
> > > > >
> > > > > Best,
> > > > >
> > > > > --
> > > > > Nan Zhu
> > > > > School of Electronic, Information and Electrical Engineering,229
> > > > > Shanghai Jiao Tong University
> > > > > 800,Dongchuan Road,Shanghai,China
> > > > > E-Mail: [EMAIL PROTECTED]
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Nan Zhu
> > > School of Electronic, Information and Electrical Engineering,229
> > > Shanghai Jiao Tong University
> > > 800,Dongchuan Road,Shanghai,China
> > > E-Mail: [EMAIL PROTECTED]
> > >
> >
>

--
Nan Zhu
School of Electronic, Information and Electrical Engineering,229
Shanghai Jiao Tong University
800,Dongchuan Road,Shanghai,China
E-Mail: [EMAIL PROTECTED]