Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Tracking parts of a job taking the most time


Copy link to this message
-
Re: Tracking parts of a job taking the most time
Pradeep Gollakota 2013-06-06, 11:22
This may not be what you're looking for, but you can also try using Twitter
Ambrose to monitor your Pig scripts as a whole.

https://github.com/twitter/ambrose

Not sure what you mean by specific parts of the script. If you mean each
operation, I don't think there's a mechanism for that. Pig obviously
executes multiple operations in a single map reduce job. Like Ruslan said,
you can look at the performance of each job on the job tracker. You might
be able to do what Johnny suggested, but since you're disabling
MultiQueryExecution, you might not get a true estimate for your scripts.
On Thu, Jun 6, 2013 at 6:57 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:

> John,
>
> I think this is the translation of DAG
> http://en.wikipedia.org/wiki/Directed_acyclic_graph Anyway, what I meant
> was the list of the generated MR jobs. When you launch a Pig script via
> command line you get something like this:
> INFO... job url... http://yourcluster:...jobid
> every time an MR job is launched.
>
> Then, when the job is finished, you get the full list of jobid's, something
> like:
> Job DAG:
> job_201304081613_0032   ->      job_201304081613_0033,
> job_201304081613_0033   ->      job_201304081613_0034,
> job_201304081613_0034   ->    ...
>
> Let me know if you have further questions
>
>
> On Wed, Jun 5, 2013 at 2:29 PM, John Meek <[EMAIL PROTECTED]> wrote:
>
> > hi Ruslan ,
> > Not sure how to do this? Can you be specific?? Whats DAG? Thanks.
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Ruslan Al-Fakikh <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>
> > Sent: Wed, Jun 5, 2013 4:04 am
> > Subject: Re: Tracking parts of a job taking the most time
> >
> >
> > Hi!
> >
> > You can look at the Pig script stats after the script is finished. There
> is
> > a DAG of MR jobs there. You can look at the individual MR jobs' stats to
> > see how much time each MR job takes
> >
> > Ruslan
> >
> >
> > On Wed, Jun 5, 2013 at 10:15 AM, Johnny Zhang <[EMAIL PROTECTED]>
> > wrote:
> >
> > > How about disable multi-query execution and use UDF CurrentTime to
> print
> > > time between each script block?
> > >
> > > Johnny
> > >
> > >
> > > On Tue, Jun 4, 2013 at 7:11 PM, John Meek <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > All,
> > > >
> > > > I have a 400 line pig script which perfoems the calculations I need
> it
> > to
> > > > perform, however I need to figure out the amount of time that
> specific
> > > > parts of the script take.
> > > >
> > > > For example, initial load from a Hbase table - id like to know how
> much
> > > > time the load takes before moving onto the next step.
> > > >
> > > > Whats the easiest way to break this down?
> > > >
> > > >
> > > > thanks,
> > > > JM
> > > >
> > >
> >
> >
> >
>