Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Hadoop counter


Copy link to this message
-
Re: Hadoop counter
Lin Ma 2012-10-19, 16:33
Hi Harsh,

Thanks for the brilliant reply.

For your comments -- "Yes, they are ultimately stored at JT until the job
is retired out of
heap memory (in which case, they get stored into the JobHistory
location and format).", does it mean only running job's counter will
consume JT memory, for completed job, counter will be stored in disk (I
think for "JobHistory location and format" is on disk?)?

regards,
Lin

On Sat, Oct 20, 2012 at 12:19 AM, Harsh J <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Inline.
>
> On Fri, Oct 19, 2012 at 9:39 PM, Lin Ma <[EMAIL PROTECTED]> wrote:
> > Hi Harsh,
> >
> > Thanks for the great reply. Two basic questions,
> >
> > - Where the counters' value are stored for successful job? On JT?
>
> Yes, they are ultimately stored at JT until the job is retired out of
> heap memory (in which case, they get stored into the JobHistory
> location and format).
>
> > - Supposing a specific job A completed successfully and updated related
> > counters, is it possible for another specific job B to read counters
> updated
> > by previous job A? If yes, how?
>
> Yes, possible, use the RunningJob object from the previous job (or
> capture one) and query it. APIs you're interested in:
>
> Grab a query-able object (RunningJob and/or a Job):
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)
> or
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Cluster.html#getJob(org.apache.hadoop.mapreduce.JobID)
>
> Query counters:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getCounters()
> or
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters()
>
> > regards,
> > Lin
> >
> >
> > On Fri, Oct 19, 2012 at 11:50 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> Bejoy is almost right, except that counters are reported upon progress
> >> of tasks itself (via TT heartbeats to JT actually), but the final
> >> counter representation is computed only with successful task reports
> >> the job received, not from any failed or killed ones.
> >>
> >> On Fri, Oct 19, 2012 at 8:51 PM, Bejoy KS <[EMAIL PROTECTED]>
> wrote:
> >> > Hi Jay
> >> >
> >> > Counters are reported at the end of a task to JT. So if a task fails
> the
> >> > counters from that task are not send to JT and hence won't be included
> >> > in
> >> > the final value of counters from that Job.
> >> > Regards
> >> > Bejoy KS
> >> >
> >> > Sent from handheld, please excuse typos.
> >> > ________________________________
> >> > From: Jay Vyas <[EMAIL PROTECTED]>
> >> > Date: Fri, 19 Oct 2012 10:18:42 -0500
> >> > To: <[EMAIL PROTECTED]>
> >> > ReplyTo: [EMAIL PROTECTED]
> >> > Subject: Re: Hadoop counter
> >> >
> >> > Ah this answers alot about why some of my dynamic counters never show
> up
> >> > and
> >> > i have to bite my nails waiting to see whats going on until the end of
> >> > the
> >> > job- thanks.
> >> >
> >> > Another question: what happens if a task fails ?  What happen to the
> >> > counters for it ?  Do they dissappear into the ether? Or do they get
> >> > merged
> >> > in with the counters from other tasks?
> >> >
> >> > On Fri, Oct 19, 2012 at 9:50 AM, Bertrand Dechoux <[EMAIL PROTECTED]
> >
> >> > wrote:
> >> >>
> >> >> And by default the number of counters is limited to 120 with the
> >> >> mapreduce.job.counters.limit property.
> >> >> They are useful for displaying short statistics about a job but
> should
> >> >> not
> >> >> be used for results (imho).
> >> >> I know people may misuse them but I haven't tried so I wouldn't be
> able
> >> >> to
> >> >> list the caveats.
> >> >>
> >> >> Regards
> >> >>
> >> >> Bertrand
> >> >>
> >> >>
> >> >> On Fri, Oct 19, 2012 at 4:35 PM, Michael Segel
> >> >> <[EMAIL PROTECTED]>
> >> >> wrote:
> >> >>>
> >> >>> As I understand it... each Task has its own counters and are
> >> >>> independently updated. As they report back to the JT, they update