Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Apache PIG Record Count


Copy link to this message
-
Re: Apache PIG Record Count
Not sure I understand you. PigStats holds a collection of JobStats, and
each MR job should create a JobStats object. You should be able to iterate
them via JobGraph.

Specifically, the input/output record counts are read from Hadoop counters.
For example,
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRJobStats.java#L287

Are you not able to iterate JobStats? Or are you not seeing any counters in
JobStats? More information will help.

On Sat, Dec 28, 2013 at 1:08 AM, Shweta Jadhav <[EMAIL PROTECTED]>wrote:

> Hi
> Thanks for your reply
> I tried getting record count using PigStats.get().
> But i am getting Stats of only Last MapReduce Job
> My PIG script is running two MR Jobs
> I also want to get Record count for first MR Job.
>
> Thanks
>
>
> On Wed, Dec 25, 2013 at 1:06 PM, Cheolsoo Park <[EMAIL PROTECTED]>
> wrote:
>
> > Well, JobStats can be retrieved indirectly.
> >
> > You can get a handle on PigStats via ExecJob-
> >
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/PigServer.java#L381
> >
> > PigStats in turn returns JobGraph that lets you iterate over JobStats-
> >
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/tools/pigstats/PigStats.java#L376
> >
> > Alternatively, you could call PigStats.get() to directly access to
> PigStats
> > thread local variable. If you do this, make sure you call it after
> > PigServer finishes the execution. Or you will end up with null.
> >
> > This area of code has changed quite a bit in trunk, but something similar
> > should work in older versions.
> >
> >
> > On Mon, Dec 23, 2013 at 9:57 PM, Shweta Jadhav <[EMAIL PROTECTED]
> > >wrote:
> >
> > > How to retrieve JobStats from PigServer Object.
> > > There is no method defined for the same in PigServer Class.
> > >
> > >
> > > On Tue, Dec 24, 2013 at 3:12 AM, Cheolsoo Park <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > See JobStats and how you can retrieve it from PigServer:
> > > >
> > > >
> > >
> >
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/tools/pigstats/JobStats.java#l170
> > > >
> > > >
> > > > On Sun, Dec 22, 2013 at 8:58 PM, Shweta Jadhav <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am running PIG script using Java API (pigserver.registerscript)
> > > > > I need to find out number of records processed and number of output
> > > > records
> > > > > using java API.
> > > > > How to implement the same.
> > > > > Thanks
> > > > > Shweta Jadhav
> > > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB