Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig job is taking more time than Java M/R


Copy link to this message
-
Pig job is taking more time than Java M/R
praveenesh kumar 2012-01-16, 05:47
Hey Guys,

Is there anyway through which I can see the M/R jobs that pig runs
internally for a given pig script ?
I wanted to get unique values for a particular column.

For that I wrote the following script:

Data = Load 'Data.csv' using PigStorage(',');
IDs = FOREACH Data GENERATE $0;
UniqueID = Distinct IDs;
Dump UniqueID;

Is it the write/best way to get unique values of a particular column ?

The reason why I am asking is, I ran the above script on my cluster, it
took around 30 minutes to finish.
However, for the same thing, when I wrote traditional java M/R code, it
took only 10 minutes.

So I want to see what Pig is doing internally.
Can anyone tell what could be the reason for such behaviour ? How can I
decrease Pig Execution time ?

Thanks,
Praveenesh