Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Apache PIG Record Count


Copy link to this message
-
Re: Apache PIG Record Count
Not sure I understand you. PigStats holds a collection of JobStats, and
each MR job should create a JobStats object. You should be able to iterate
them via JobGraph.

Specifically, the input/output record counts are read from Hadoop counters.
For example,
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRJobStats.java#L287

Are you not able to iterate JobStats? Or are you not seeing any counters in
JobStats? More information will help.

On Sat, Dec 28, 2013 at 1:08 AM, Shweta Jadhav <[EMAIL PROTECTED]>wrote:

> Hi
> Thanks for your reply
> I tried getting record count using PigStats.get().
> But i am getting Stats of only Last MapReduce Job
> My PIG script is running two MR Jobs
> I also want to get Record count for first MR Job.
>
> Thanks
>
>
> On Wed, Dec 25, 2013 at 1:06 PM, Cheolsoo Park <[EMAIL PROTECTED]>
> wrote:
>
> > Well, JobStats can be retrieved indirectly.
> >
> > You can get a handle on PigStats via ExecJob-
> >
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/PigServer.java#L381
> >
> > PigStats in turn returns JobGraph that lets you iterate over JobStats-
> >
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/tools/pigstats/PigStats.java#L376
> >
> > Alternatively, you could call PigStats.get() to directly access to
> PigStats
> > thread local variable. If you do this, make sure you call it after
> > PigServer finishes the execution. Or you will end up with null.
> >
> > This area of code has changed quite a bit in trunk, but something similar
> > should work in older versions.
> >
> >
> > On Mon, Dec 23, 2013 at 9:57 PM, Shweta Jadhav <[EMAIL PROTECTED]
> > >wrote:
> >
> > > How to retrieve JobStats from PigServer Object.
> > > There is no method defined for the same in PigServer Class.
> > >
> > >
> > > On Tue, Dec 24, 2013 at 3:12 AM, Cheolsoo Park <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > See JobStats and how you can retrieve it from PigServer:
> > > >
> > > >
> > >
> >
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/tools/pigstats/JobStats.java#l170
> > > >
> > > >
> > > > On Sun, Dec 22, 2013 at 8:58 PM, Shweta Jadhav <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am running PIG script using Java API (pigserver.registerscript)
> > > > > I need to find out number of records processed and number of output
> > > > records
> > > > > using java API.
> > > > > How to implement the same.
> > > > > Thanks
> > > > > Shweta Jadhav
> > > > >
> > > >
> > >
> >
>