Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Run a job async


Copy link to this message
-
Re: Run a job async
Cheolsoo Park 2013-01-25, 17:08
>> if you have multiple threads that run a query via PigServer, there is a
great chance of the internals clashing because of the use of static
variable within Pig.

Recently, I spent some time on this, and what I found is that the Pig
front-end is quite thread-safe. Here is how I tested it:

1) Wrote a PigUnit test that runs in MR mode.
2) Executed test cases concurrently in 4 threads using a JUnit extension
called temps-fugit:
http://tempusfugitlibrary.org/documentation/junit/parallel/

After fixing PIG-3096, I was able to successfully run Pig queries in
parallel. It's important to note that only the front-end needs to be
thread-safe since that's what is executed in parallel.

I arbitrarily selected queries from e2e test cases, so they are probably
not complex enough to mimic real-world examples. Nevertheless, my test
program ran without a problem for few days. I couldn't continue my
experiment because I was pulled out into something else. However, I think
that making the front-end thread-safe is an achievable goal.

Thanks,
Cheolsoo

On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
<[EMAIL PROTECTED]>wrote:

> That clarifies it for me, thanks a lot.
>
> Regards,
> Rama.
>
>
> On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <[EMAIL PROTECTED]
> >wrote:
>
> > Well, when I say that Pig is not multi-threaded, what I mean is that if
> you
> > have multiple threads that run a query via PigServer, there is a great
> > chance of the internals clashing because of the use of static variables
> > within Pig. Pig itself, when running a single query, is multi-threaded.
> > It's just not "multi-threaded" in the sense that multiple instances can
> > safely be run in the same JVM.
> >
> >
> > 2013/1/24 Ramakrishna Nalam <[EMAIL PROTECTED]>
> >
> > > Hi Jonathan,
> > >
> > > Pardon if it's a naive question, but Interesting that you say Pig is
> not
> > > multithreaded.
> > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> right
> > > things to handle multi threaded requests (ThreadLocal for ScriptState
> for
> > > eg).
> > >
> > > Would be great if you can point out to the kind of issues there could
> be.
> > >
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > >
> > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Are there any plans on making the pigserver multi-threaded?
> > > >
> > > > since there is "PigProcessNotificationListener" to subscribe for
> async
> > > > callbacks when the pig job completes, is there any real need to keep
> > the
> > > > pig job submitting thread waiting until the job completes?
> > > >
> > > > Is this just a shortcoming today or are there more concrete reasons
> > > against
> > > > providing with a pigserver which can submit to the cluster in
> mapreduce
> > > > mode async?
> > > >
> > > > Thanks,
> > > > Praveen
> > > >
> > > >
> > > >
> > > > On Wed, Jan 23, 2013 at 10:56 PM, Jonathan Coveney <
> [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > I think whatever way you slice it, handling thousands of pig jobs
> > > > > asynchronously is going to be a bear. I mean, this is essentially
> > what
> > > > the
> > > > > job tracker does, albeit with a lot less information.
> > > > >
> > > > > Either way, Pig is not multi-threaded so having more than one
> > instance
> > > of
> > > > > Pig in the same JVM is going to start causing problems (which is
> > why, I
> > > > > imagine, there is no async way to call Pig). So multiple processes
> is
> > > > > really the only way around it that I know of.
> > > > >
> > > > > At Twitter we have a deployment of mesos, and our long term
> solution
> > is
> > > > > going to be running all of our pig jobs on mesos, in the short term
> > by
> > > > > deploying daemons that run pig jobs as local processes.
> > > > >
> > > > >
> > > > > 2013/1/23 Prashant Kommireddi <[EMAIL PROTECTED]>
> > > > >
> > > > > > Both. Think of it as an app server handling all of these
> requests.
> > > > > >
>