While you're grokking this aspect of MapReduce's configuration, you may want
to check out https://issues.apache.org/jira/browse/MAPREDUCE-64, which is on
its way into trunk right now. Chris Douglas from Yahoo! has posted a very
nice explanation of how buffers are managed during the shuffle and which
parameters affect the behavior.
On Tue, Dec 22, 2009 at 12:30 PM, Mark Vigeant <[EMAIL PROTECTED]
> Thank you for the responses guys!
> First, to Patrick, I didn't set it in the code, though I will try it
> because that's a really good idea to set it there, so I shall play around
> with that.
> Long: I should have clarified, I am using 0.20.1, and so this is a bit
> different. I set the parameter in mapred-site.xml and for some reason it's
> just not getting implemented. Thank you anyways, though!
> -----Original Message-----
> From: Long Van Nguyen Dinh [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, December 22, 2009 12:17 PM
> To: [EMAIL PROTECTED]
> Subject: Re: io.sort.mb configuration?
> Hadoop has a default file (hadoop-default.xml - version 19) for all
> configuration, don't change the values in that file (they won't be
> affected), copy the parameter to the file hadoop-site.xml where you
> set up the cluster and set the value you want there.
> Long Van
> On Tue, Dec 22, 2009 at 11:40 AM, Patrick Angeles
> <[EMAIL PROTECTED]> wrote:
> > You can also set that param per-job. Maybe you called some code that did
> > that behind the scenes?
> > On Tue, Dec 22, 2009 at 11:10 AM, Mark Vigeant <
> [EMAIL PROTECTED]
> >> wrote:
> >> Hey Everyone-
> >> I've been playing around with Hadoop and Hbase for a while and I noticed
> >> that when running a program to upload data into an HTable I saw the
> >> INFO mapred.MapTask: io.sort.mb = 100
> >> Which is the default value, but in the mapred configuration on all
> >> in my cluster I set this value to 250. Could it be that my program is
> >> accessing the configuration properly? Is that too large a value? Or is
> >> most likely just a foolish syntax error on my part?
> >> Thank you very much, all input is appreciated.
> >> Mark Vigeant
> >> RiskMetrics Group, Inc.
> >> This email message and any attachments are for the sole use of the
> >> recipients and may contain proprietary and/or confidential information
> >> may be privileged or otherwise protected from disclosure. Any
> >> review, use, disclosure or distribution is prohibited. If you are not an
> >> intended recipient, please contact the sender by reply email and destroy
> >> original message and any copies of the message as well as any
> attachments to
> >> the original message.
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.