|
|
-
Is there a way to turn off MAPREDUCE-2415?
Koert Kuipers 2012-08-26, 17:32
We have smaller nodes (4 to 6 disks), and we used to write logs to the same disk as where the OS is. So if that disks goes then i don't really care about tasktrackers failing. Also, the fact that logs were written to a single partition meant that i could make sure they would not grow too large in case someone had too verbose logging on a large job. With MAPREDUCE-2415 a job that does massive amount of logging can fill up all the mapred.local.dir, which in our case are on the same partition as the hdfs data dirs, so now faulty logging can fill up hdfs storage, which i really don't like. Any ideas?
+
Koert Kuipers 2012-08-26, 17:32
-
Re: Is there a way to turn off MAPREDUCE-2415?
Harsh J 2012-08-26, 17:44
Hi Koert,
To answer on point, there is no turning off this feature.
Since you don't seem to care much for logs from tasks persisting, perhaps consider lowering the mapred.userlog.retain.hours to a lower value than 24 hours (such as 1h)? Or you may even limit the logging from each task to a certain amount of KB via mapred.userlog.limit.kb, which is unlimited by default.
Would either of these work for you?
On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > We have smaller nodes (4 to 6 disks), and we used to write logs to the same > disk as where the OS is. So if that disks goes then i don't really care > about tasktrackers failing. Also, the fact that logs were written to a > single partition meant that i could make sure they would not grow too large > in case someone had too verbose logging on a large job. With MAPREDUCE-2415 > a job that does massive amount of logging can fill up all the > mapred.local.dir, which in our case are on the same partition as the hdfs > data dirs, so now faulty logging can fill up hdfs storage, which i really > don't like. Any ideas? > >
-- Harsh J
+
Harsh J 2012-08-26, 17:44
-
Re: Is there a way to turn off MAPREDUCE-2415?
Koert Kuipers 2012-08-26, 17:50
Hey Harsh, Thanks for responding! Would limiting the logging for each task via mapred.userlog.limit.kb be strictly enforced (while the job is running)? That would solve my issue of runaway logging on a job filling up the datanode disks. I would set the limit high since in general i do want to retain logs, just not in case a single rogue job starts producing many gigabytes of logs. Thanks!
On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Koert, > > To answer on point, there is no turning off this feature. > > Since you don't seem to care much for logs from tasks persisting, > perhaps consider lowering the mapred.userlog.retain.hours to a lower > value than 24 hours (such as 1h)? Or you may even limit the logging > from each task to a certain amount of KB via mapred.userlog.limit.kb, > which is unlimited by default. > > Would either of these work for you? > > On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > > We have smaller nodes (4 to 6 disks), and we used to write logs to the > same > > disk as where the OS is. So if that disks goes then i don't really care > > about tasktrackers failing. Also, the fact that logs were written to a > > single partition meant that i could make sure they would not grow too > large > > in case someone had too verbose logging on a large job. With > MAPREDUCE-2415 > > a job that does massive amount of logging can fill up all the > > mapred.local.dir, which in our case are on the same partition as the hdfs > > data dirs, so now faulty logging can fill up hdfs storage, which i really > > don't like. Any ideas? > > > > > > > > -- > Harsh J >
+
Koert Kuipers 2012-08-26, 17:50
-
Re: Is there a way to turn off MAPREDUCE-2415?
Harsh J 2012-08-26, 17:58
Hi Koert,
On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > Hey Harsh, > Thanks for responding! > Would limiting the logging for each task via mapred.userlog.limit.kb be > strictly enforced (while the job is running)? That would solve my issue of > runaway logging on a job filling up the datanode disks. I would set the > limit high since in general i do want to retain logs, just not in case a > single rogue job starts producing many gigabytes of logs. > Thanks!
It is not strictly enforced such as counter limits are. Exceeding it wouldn't fail the task, only cause the extra logged events to not appear at all (thereby limiting the size).
> On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Hi Koert, >> >> To answer on point, there is no turning off this feature. >> >> Since you don't seem to care much for logs from tasks persisting, >> perhaps consider lowering the mapred.userlog.retain.hours to a lower >> value than 24 hours (such as 1h)? Or you may even limit the logging >> from each task to a certain amount of KB via mapred.userlog.limit.kb, >> which is unlimited by default. >> >> Would either of these work for you? >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: >> > We have smaller nodes (4 to 6 disks), and we used to write logs to the >> > same >> > disk as where the OS is. So if that disks goes then i don't really care >> > about tasktrackers failing. Also, the fact that logs were written to a >> > single partition meant that i could make sure they would not grow too >> > large >> > in case someone had too verbose logging on a large job. With >> > MAPREDUCE-2415 >> > a job that does massive amount of logging can fill up all the >> > mapred.local.dir, which in our case are on the same partition as the >> > hdfs >> > data dirs, so now faulty logging can fill up hdfs storage, which i >> > really >> > don't like. Any ideas? >> > >> > >> >> >> >> -- >> Harsh J > >
-- Harsh J
+
Harsh J 2012-08-26, 17:58
-
Re: Is there a way to turn off MAPREDUCE-2415?
Koert Kuipers 2012-08-26, 18:07
Looks like mapred.userlog.limit.kb is implemented by keeping some list in memory, and the logs are not writting to disk until the job finishes or is killed. That doesn't sound acceptable to me.
Well i am not the only one with this problem. See MAPREDUCE-1100
On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Koert, > > On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > > Hey Harsh, > > Thanks for responding! > > Would limiting the logging for each task via mapred.userlog.limit.kb be > > strictly enforced (while the job is running)? That would solve my issue > of > > runaway logging on a job filling up the datanode disks. I would set the > > limit high since in general i do want to retain logs, just not in case a > > single rogue job starts producing many gigabytes of logs. > > Thanks! > > It is not strictly enforced such as counter limits are. Exceeding it > wouldn't fail the task, only cause the extra logged events to not > appear at all (thereby limiting the size). > > > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> Hi Koert, > >> > >> To answer on point, there is no turning off this feature. > >> > >> Since you don't seem to care much for logs from tasks persisting, > >> perhaps consider lowering the mapred.userlog.retain.hours to a lower > >> value than 24 hours (such as 1h)? Or you may even limit the logging > >> from each task to a certain amount of KB via mapred.userlog.limit.kb, > >> which is unlimited by default. > >> > >> Would either of these work for you? > >> > >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> > wrote: > >> > We have smaller nodes (4 to 6 disks), and we used to write logs to the > >> > same > >> > disk as where the OS is. So if that disks goes then i don't really > care > >> > about tasktrackers failing. Also, the fact that logs were written to a > >> > single partition meant that i could make sure they would not grow too > >> > large > >> > in case someone had too verbose logging on a large job. With > >> > MAPREDUCE-2415 > >> > a job that does massive amount of logging can fill up all the > >> > mapred.local.dir, which in our case are on the same partition as the > >> > hdfs > >> > data dirs, so now faulty logging can fill up hdfs storage, which i > >> > really > >> > don't like. Any ideas? > >> > > >> > > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J >
+
Koert Kuipers 2012-08-26, 18:07
-
Re: Is there a way to turn off MAPREDUCE-2415?
Harsh J 2012-08-26, 18:21
Yes that is true, it does maintain N events in memory and then flushes them down to disk upon closure. With a reasonable size (2 MB of logs say) I don't see that causing any memory fill-up issues at all, since it does cap (and discard at tail).
The other alternative may be to switch down the log level on the task, via mapred.map.child.log.level and/or mapred.reduce.child.log.level set to WARN or ERROR.
On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > Looks like mapred.userlog.limit.kb is implemented by keeping some list in > memory, and the logs are not writting to disk until the job finishes or is > killed. That doesn't sound acceptable to me. > > Well i am not the only one with this problem. See MAPREDUCE-1100 > > > On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Hi Koert, >> >> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: >> > Hey Harsh, >> > Thanks for responding! >> > Would limiting the logging for each task via mapred.userlog.limit.kb be >> > strictly enforced (while the job is running)? That would solve my issue >> > of >> > runaway logging on a job filling up the datanode disks. I would set the >> > limit high since in general i do want to retain logs, just not in case a >> > single rogue job starts producing many gigabytes of logs. >> > Thanks! >> >> It is not strictly enforced such as counter limits are. Exceeding it >> wouldn't fail the task, only cause the extra logged events to not >> appear at all (thereby limiting the size). >> >> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >> Hi Koert, >> >> >> >> To answer on point, there is no turning off this feature. >> >> >> >> Since you don't seem to care much for logs from tasks persisting, >> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower >> >> value than 24 hours (such as 1h)? Or you may even limit the logging >> >> from each task to a certain amount of KB via mapred.userlog.limit.kb, >> >> which is unlimited by default. >> >> >> >> Would either of these work for you? >> >> >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> >> >> wrote: >> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to >> >> > the >> >> > same >> >> > disk as where the OS is. So if that disks goes then i don't really >> >> > care >> >> > about tasktrackers failing. Also, the fact that logs were written to >> >> > a >> >> > single partition meant that i could make sure they would not grow too >> >> > large >> >> > in case someone had too verbose logging on a large job. With >> >> > MAPREDUCE-2415 >> >> > a job that does massive amount of logging can fill up all the >> >> > mapred.local.dir, which in our case are on the same partition as the >> >> > hdfs >> >> > data dirs, so now faulty logging can fill up hdfs storage, which i >> >> > really >> >> > don't like. Any ideas? >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Harsh J >> > >> > >> >> >> >> -- >> Harsh J > >
-- Harsh J
+
Harsh J 2012-08-26, 18:21
-
Re: Is there a way to turn off MAPREDUCE-2415?
Koert Kuipers 2012-08-26, 18:39
Harsh,
I see the problem as follows: Usually we want to have people log what they want, as long as they don't threaten the stability of the system.
However every once in a while somebody will submit a job that is overly verbose and will generate many gigabytes of logs in minutes. This is typically a honest mistake, and the person doesn't realize what is going on (why is my job so slow?). Limiting the general logging levels for everyone to deal with these mistakes seems ineffective. Telling the person to change the logging level for his job will not work either since he/she doesn't realize what is going on and certainly didn't know in advance.
So all i really want is a very high and hard limit on the log size per job, to protect the system. Say many hundreds of megabytes or even gigabytes. But when this limit is reached i want to logging to stop from that point on, or even the job to be killed. mapred.userlog.limit.kb seems the wrong tool for the job.
Before the logging got moved to the mapred.local.dir i had a limit simply by limiting the size of the partition that logging went to.
Anyhow, looks like i will have to wait for MAPRED-1100
Have a good day! Koert
On Sun, Aug 26, 2012 at 2:21 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Yes that is true, it does maintain N events in memory and then flushes > them down to disk upon closure. With a reasonable size (2 MB of logs > say) I don't see that causing any memory fill-up issues at all, since > it does cap (and discard at tail). > > The other alternative may be to switch down the log level on the task, > via mapred.map.child.log.level and/or mapred.reduce.child.log.level > set to WARN or ERROR. > > On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > > Looks like mapred.userlog.limit.kb is implemented by keeping some list in > > memory, and the logs are not writting to disk until the job finishes or > is > > killed. That doesn't sound acceptable to me. > > > > Well i am not the only one with this problem. See MAPREDUCE-1100 > > > > > > On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> Hi Koert, > >> > >> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[EMAIL PROTECTED]> > wrote: > >> > Hey Harsh, > >> > Thanks for responding! > >> > Would limiting the logging for each task via mapred.userlog.limit.kb > be > >> > strictly enforced (while the job is running)? That would solve my > issue > >> > of > >> > runaway logging on a job filling up the datanode disks. I would set > the > >> > limit high since in general i do want to retain logs, just not in > case a > >> > single rogue job starts producing many gigabytes of logs. > >> > Thanks! > >> > >> It is not strictly enforced such as counter limits are. Exceeding it > >> wouldn't fail the task, only cause the extra logged events to not > >> appear at all (thereby limiting the size). > >> > >> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> >> > >> >> Hi Koert, > >> >> > >> >> To answer on point, there is no turning off this feature. > >> >> > >> >> Since you don't seem to care much for logs from tasks persisting, > >> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower > >> >> value than 24 hours (such as 1h)? Or you may even limit the logging > >> >> from each task to a certain amount of KB via mapred.userlog.limit.kb, > >> >> which is unlimited by default. > >> >> > >> >> Would either of these work for you? > >> >> > >> >> On Sun, Aug 26, 2012 at 11:02 PM, Koert Kuipers <[EMAIL PROTECTED]> > >> >> wrote: > >> >> > We have smaller nodes (4 to 6 disks), and we used to write logs to > >> >> > the > >> >> > same > >> >> > disk as where the OS is. So if that disks goes then i don't really > >> >> > care > >> >> > about tasktrackers failing. Also, the fact that logs were written > to > >> >> > a > >> >> > single partition meant that i could make sure they would not grow > too > >> >> > large > >> >> > in case someone had too verbose logging on a large job. With
+
Koert Kuipers 2012-08-26, 18:39
-
Re: Is there a way to turn off MAPREDUCE-2415?
Harsh J 2012-08-26, 19:03
Hi,
On Mon, Aug 27, 2012 at 12:09 AM, Koert Kuipers <[EMAIL PROTECTED]> wrote: > Harsh, > > I see the problem as follows: Usually we want to have people log what they > want, as long as they don't threaten the stability of the system. > > However every once in a while somebody will submit a job that is overly > verbose and will generate many gigabytes of logs in minutes. This is > typically a honest mistake, and the person doesn't realize what is going on > (why is my job so slow?). Limiting the general logging levels for everyone > to deal with these mistakes seems ineffective. Telling the person to change > the logging level for his job will not work either since he/she doesn't > realize what is going on and certainly didn't know in advance.
I had meant to say you could enforce the logging level on the child tasks via finalized job options, but yeah that'd be way too restrictive..
> So all i really want is a very high and hard limit on the log size per job, > to protect the system. Say many hundreds of megabytes or even gigabytes. But > when this limit is reached i want to logging to stop from that point on, or > even the job to be killed. mapred.userlog.limit.kb seems the wrong tool for > the job.
Hundreds of MB of logs seems too much for a single task to emit. I believe a good limit is < 10 MB. But yeah, makes sense that one could want more for different forms of jobs and purposes. For such a requirement, I agree the limit.kb isn't the right solution. Perhaps just the retain hours value then.
> Before the logging got moved to the mapred.local.dir i had a limit simply by > limiting the size of the partition that logging went to. > > Anyhow, looks like i will have to wait for MAPRED-1100
I agree.
> Have a good day! Koert > > On Sun, Aug 26, 2012 at 2:21 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Yes that is true, it does maintain N events in memory and then flushes >> them down to disk upon closure. With a reasonable size (2 MB of logs >> say) I don't see that causing any memory fill-up issues at all, since >> it does cap (and discard at tail). >> >> The other alternative may be to switch down the log level on the task, >> via mapred.map.child.log.level and/or mapred.reduce.child.log.level >> set to WARN or ERROR. >> >> On Sun, Aug 26, 2012 at 11:37 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote: >> > Looks like mapred.userlog.limit.kb is implemented by keeping some list >> > in >> > memory, and the logs are not writting to disk until the job finishes or >> > is >> > killed. That doesn't sound acceptable to me. >> > >> > Well i am not the only one with this problem. See MAPREDUCE-1100 >> > >> > >> > On Sun, Aug 26, 2012 at 1:58 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >> Hi Koert, >> >> >> >> On Sun, Aug 26, 2012 at 11:20 PM, Koert Kuipers <[EMAIL PROTECTED]> >> >> wrote: >> >> > Hey Harsh, >> >> > Thanks for responding! >> >> > Would limiting the logging for each task via mapred.userlog.limit.kb >> >> > be >> >> > strictly enforced (while the job is running)? That would solve my >> >> > issue >> >> > of >> >> > runaway logging on a job filling up the datanode disks. I would set >> >> > the >> >> > limit high since in general i do want to retain logs, just not in >> >> > case a >> >> > single rogue job starts producing many gigabytes of logs. >> >> > Thanks! >> >> >> >> It is not strictly enforced such as counter limits are. Exceeding it >> >> wouldn't fail the task, only cause the extra logged events to not >> >> appear at all (thereby limiting the size). >> >> >> >> > On Sun, Aug 26, 2012 at 1:44 PM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> Hi Koert, >> >> >> >> >> >> To answer on point, there is no turning off this feature. >> >> >> >> >> >> Since you don't seem to care much for logs from tasks persisting, >> >> >> perhaps consider lowering the mapred.userlog.retain.hours to a lower >> >> >> value than 24 hours (such as 1h)? Or you may even limit the logging >> >> >> from each task to a certain amount of KB via
Harsh J
+
Harsh J 2012-08-26, 19:03
|
|