Koert Kuipers 2012-08-26, 17:32
Harsh J 2012-08-26, 17:44
Thanks for responding!
Would limiting the logging for each task via mapred.userlog.limit.kb be
strictly enforced (while the job is running)? That would solve my issue of
runaway logging on a job filling up the datanode disks. I would set the
limit high since in general i do want to retain logs, just not in case a
single rogue job starts producing many gigabytes of logs.
On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Koert,
> To answer on point, there is no turning off this feature.
> Since you don't seem to care much for logs from tasks persisting,
> perhaps consider lowering the mapred.userlog.retain.hours to a lower
> value than 24 hours (such as 1h)? Or you may even limit the logging
> from each task to a certain amount of KB via mapred.userlog.limit.kb,
> which is unlimited by default.
> Would either of these work for you?
> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote:
> > We have smaller nodes (4 to 6 disks), and we used to write logs to the
> > disk as where the OS is. So if that disks goes then i don't really care
> > about tasktrackers failing. Also, the fact that logs were written to a
> > single partition meant that i could make sure they would not grow too
> > in case someone had too verbose logging on a large job. With
> > a job that does massive amount of logging can fill up all the
> > mapred.local.dir, which in our case are on the same partition as the hdfs
> > data dirs, so now faulty logging can fill up hdfs storage, which i really
> > don't like. Any ideas?
> Harsh J
Harsh J 2012-08-26, 17:58
Koert Kuipers 2012-08-26, 18:07
Harsh J 2012-08-26, 18:21
Koert Kuipers 2012-08-26, 18:39
Harsh J 2012-08-26, 19:03