Looks like mapred.userlog.limit.kb is implemented by keeping some list in
memory, and the logs are not writting to disk until the job finishes or is
killed. That doesn't sound acceptable to me.
Well i am not the only one with this problem. See MAPREDUCE-1100
On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Koert,
> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote:
> > Hey Harsh,
> > Thanks for responding!
> > Would limiting the logging for each task via mapred.userlog.limit.kb be
> > strictly enforced (while the job is running)? That would solve my issue
> > runaway logging on a job filling up the datanode disks. I would set the
> > limit high since in general i do want to retain logs, just not in case a
> > single rogue job starts producing many gigabytes of logs.
> > Thanks!
> It is not strictly enforced such as counter limits are. Exceeding it
> wouldn't fail the task, only cause the extra logged events to not
> appear at all (thereby limiting the size).
> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >> Hi Koert,
> >> To answer on point, there is no turning off this feature.
> >> Since you don't seem to care much for logs from tasks persisting,
> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower
> >> value than 24 hours (such as 1h)? Or you may even limit the logging
> >> from each task to a certain amount of KB via mapred.userlog.limit.kb,
> >> which is unlimited by default.
> >> Would either of these work for you?
> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]>
> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to the
> >> > same
> >> > disk as where the OS is. So if that disks goes then i don't really
> >> > about tasktrackers failing. Also, the fact that logs were written to a
> >> > single partition meant that i could make sure they would not grow too
> >> > large
> >> > in case someone had too verbose logging on a large job. With
> >> > MAPREDUCE-2415
> >> > a job that does massive amount of logging can fill up all the
> >> > mapred.local.dir, which in our case are on the same partition as the
> >> > hdfs
> >> > data dirs, so now faulty logging can fill up hdfs storage, which i
> >> > really
> >> > don't like. Any ideas?
> >> >
> >> >
> >> --
> >> Harsh J
> Harsh J