Between the point you submit a script to Pig to the point where MR
job starts executing on a cluster, following are three things that may
take a while depending on whats affecting you.
1) Your cluster is heavily loaded and job tracker is busy dealing with
other jobs. In which case jobtracker wont schedule job just submitted
right away. This can be alleviated by tweaking the scheduling policies
of job tracker.
2) You are working with really large datasets (tens of thousands of
splits). In this case input split calculation which happens on client
machine may take a long while.
3) Your pig script is quite large (tens of thousands of lines).
Currently Pig takes a bit of time to compile very large scripts.
Depending on your situation, you might be hitting one of these issues.
Or, there is some new issue which we will discover now :)
On Fri, Mar 26, 2010 at 04:43, jr <[EMAIL PROTECTED]> wrote:
> Hello everybody,
> I've noticed that when i run some pig scripts, the creation of the
> actual hadoop jobs takes quite a while, sometimes more than 15 minutes
> until the first map/reduce job starts.
> How can I accelerate this? Which machine does that and what do i have to
> throw at it? Is it the pig client machine that needs more beef?
> Thanks for your answers,