Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Apache PIG Record Count


Copy link to this message
-
Re: Apache PIG Record Count
Hi,
In our single pig script we are executing 2 jobs as listed below.

final= COGROUP cdr2 BY mobno INNER , crm1 BY mobno INNER;
final1 = foreach final generate flatten(cdr2),
flatten(crm1.(customerId,region,age_group,customerSegment));
STORE final1 INTO '/final_output2' USING PigStorage('$');
final2= JOIN cdr2 BY mobno , crm1 BY mobno;
STORE final2 INTO '/final_output1' USING PigStorage('$');

STORE final1 is first job
STORE final2 is second job

In  our java class using PIgStats we accessed JobGraph object as mentioned
below.
PigStats lStats = PigStats.get();
JobGraph lJobGraph = lStats.getJobGraph();

Number of jobs returned by lJobGraph.getJobList().size() returns as 1 which
is primarily handle of last job executed in pig script(final2). We are not
able to get the handle of first job(final1).

While executing pig scripts we get information about 2 jobs executing and
getting printed in console but same is not reflected in Java client
program. Is something missing in our code?
Please help us to resolve this issue.
Shweta.

On Sun, Dec 29, 2013 at 8:04 AM, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Not sure I understand you. PigStats holds a collection of JobStats, and
> each MR job should create a JobStats object. You should be able to iterate
> them via JobGraph.
>
> Specifically, the input/output record counts are read from Hadoop counters.
> For example,
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRJobStats.java#L287
>
> Are you not able to iterate JobStats? Or are you not seeing any counters in
> JobStats? More information will help.
>
>
>
> On Sat, Dec 28, 2013 at 1:08 AM, Shweta Jadhav <[EMAIL PROTECTED]
> >wrote:
>
> > Hi
> > Thanks for your reply
> > I tried getting record count using PigStats.get().
> > But i am getting Stats of only Last MapReduce Job
> > My PIG script is running two MR Jobs
> > I also want to get Record count for first MR Job.
> >
> > Thanks
> >
> >
> > On Wed, Dec 25, 2013 at 1:06 PM, Cheolsoo Park <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Well, JobStats can be retrieved indirectly.
> > >
> > > You can get a handle on PigStats via ExecJob-
> > >
> > >
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/PigServer.java#L381
> > >
> > > PigStats in turn returns JobGraph that lets you iterate over JobStats-
> > >
> > >
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/tools/pigstats/PigStats.java#L376
> > >
> > > Alternatively, you could call PigStats.get() to directly access to
> > PigStats
> > > thread local variable. If you do this, make sure you call it after
> > > PigServer finishes the execution. Or you will end up with null.
> > >
> > > This area of code has changed quite a bit in trunk, but something
> similar
> > > should work in older versions.
> > >
> > >
> > > On Mon, Dec 23, 2013 at 9:57 PM, Shweta Jadhav <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > How to retrieve JobStats from PigServer Object.
> > > > There is no method defined for the same in PigServer Class.
> > > >
> > > >
> > > > On Tue, Dec 24, 2013 at 3:12 AM, Cheolsoo Park <[EMAIL PROTECTED]
> >
> > > > wrote:
> > > >
> > > > > See JobStats and how you can retrieve it from PigServer:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pig/blob/branch-0.12/src/org/apache/pig/tools/pigstats/JobStats.java#l170
> > > > >
> > > > >
> > > > > On Sun, Dec 22, 2013 at 8:58 PM, Shweta Jadhav <
> > > [EMAIL PROTECTED]
> > > > > >wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am running PIG script using Java API (pigserver.registerscript)
> > > > > > I need to find out number of records processed and number of
> output
> > > > > records
> > > > > > using java API.
> > > > > > How to implement the same.
> > > > > > Thanks
> > > > > > Shweta Jadhav
> > > > > >
> > > > >
> > > >
> > >
> >
>