Harsh - I'd be inclined to think it's worse than just setting mapreduce.jobtracker.completeuserjobs.maximum - the only case this would solve is if a single user submitted 25 *large* jobs (in terms of tasks) over a single 24-hr window.
David - I'm guessing you aren't using the CapacityScheduler - that would help you with more controls, limits on jobs etc.
More details here:
http://hadoop.apache.org/common/docs/r1.0.3/capacity_scheduler.htmlIn particular, look at the example config there and let us know if you need help understanding any of it.
Arun
On Jun 9, 2012, at 10:40 PM, Harsh J wrote:
> Hey David,
>
> Primarily you'd need to lower down
> "mapred.jobtracker.completeuserjobs.maximum" in your mapred-site.xml
> to a value of < 25. I recommend using 5, if you don't need much
> retention of job info per user. This will help keep the JT's live
> memory usage in check and stop your crashes instead of you having to
> raise your heap all the time. There's no "leak", but this config's
> default of 100 causes much issues to JT that runs a lot of jobs per
> day (from several users).
>
> Try it out and let us know!
>
> On Sat, Jun 9, 2012 at 12:37 AM, David Rosenstrauch <[EMAIL PROTECTED]> wrote:
>> We're running 0.20.2 (Cloudera cdh3u4).
>>
>> What configs are you referring to?
>>
>> Thanks,
>>
>> DR
>>
>>
>> On 06/08/2012 02:59 PM, Arun C Murthy wrote:
>>>
>>> This shouldn't be happening at all...
>>>
>>> What version of hadoop are you running? Potentially you need configs to
>>> protect the JT that you are missing, those should ensure your hadoop-1.x JT
>>> is very reliable.
>>>
>>> Arun
>>>
>>> On Jun 8, 2012, at 8:26 AM, David Rosenstrauch wrote:
>>>
>>>> Our job tracker has been seizing up with Out of Memory (heap space)
>>>> errors for the past 2 nights. After the first night's crash, I doubled the
>>>> heap space (from the default of 1GB) to 2GB before restarting the job.
>>>> After last night's crash I doubled it again to 4GB.
>>>>
>>>> This all seems a bit puzzling to me. I wouldn't have thought that the
>>>> job tracker should require so much memory. (The NameNode, yes, but not the
>>>> job tracker.)
>>>>
>>>> Just wondering if this behavior sounds reasonable, or if perhaps there
>>>> might be a bigger problem at play here. Anyone have any thoughts on the
>>>> matter?
>>>>
>>>> Thanks,
>>>>
>>>> DR
>>>
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>>
http://hortonworks.com/>>>
>>>
>>>
>>
>>
>
>
>
> --
> Harsh J
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/