Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig job is taking more time than Java M/R


Copy link to this message
-
Re: Pig job is taking more time than Java M/R
I am using Apache Pig version 0.11.0-SNAPSHOT (r1225753) build from trunk
and Hadoop 0.20.205
Nothing else was running that time on cluster that time. and there was no
waiting for map-reduce slots.
Only difference I saw was for my Java M/R job, only 40 reducers were running
whereas my pig job was running 457 reducers. I guess it may be because of
so many reducers running.
Can I control number of reducers running ?

Thanks,
Praveenesh
On Mon, Jan 16, 2012 at 11:42 AM, Prashant Kommireddi
<[EMAIL PROTECTED]>wrote:

> Hi Praveenesh,
>
> You can use 'EXPLAIN' to understand what Pig is doing under the hood (MR
> plan)
> http://pig.apache.org/docs/r0.9.1/test.html#explain
>
> What version of Pig and Hadoop are you using? I have never seen such a huge
> difference between Java MR and Pig. At the time you ran Pig, was the
> cluster idle or did you have other jobs running at the same time? Did you
> make sure the job was not waiting on Map or Reduce slots being made
> available?
>
> Thanks,
> Prashant
>
> On Sun, Jan 15, 2012 at 9:47 PM, praveenesh kumar <[EMAIL PROTECTED]
> >wrote:
>
> > Hey Guys,
> >
> > Is there anyway through which I can see the M/R jobs that pig runs
> > internally for a given pig script ?
> > I wanted to get unique values for a particular column.
> >
> > For that I wrote the following script:
> >
> > Data = Load 'Data.csv' using PigStorage(',');
> > IDs = FOREACH Data GENERATE $0;
> > UniqueID = Distinct IDs;
> > Dump UniqueID;
> >
> > Is it the write/best way to get unique values of a particular column ?
> >
> > The reason why I am asking is, I ran the above script on my cluster, it
> > took around 30 minutes to finish.
> > However, for the same thing, when I wrote traditional java M/R code, it
> > took only 10 minutes.
> >
> > So I want to see what Pig is doing internally.
> > Can anyone tell what could be the reason for such behaviour ? How can I
> > decrease Pig Execution time ?
> >
> > Thanks,
> > Praveenesh
> >
>