Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Run a job async


Copy link to this message
-
Re: Run a job async
Rohini Palaniswamy 2013-01-26, 00:23
Jon,
  Those are good areas to check. Few things I have seen regarding those are

 1) JythonScriptEngine -PythonInterpreter is static and is not suitable for
multiple runs if the script names are same (hit this issue in PIG-2433 unit
tests).
 2) QueryParserDriver - There is a static cache with macro name to macro
file mapping. So same macro names with different file locations will cause
problems.
 3) FileLocalizer.relativeRoot - If single cluster no issues. Just need to
reinitialize if supporting Multiple clusters.

Regards,
Rohini
On Fri, Jan 25, 2013 at 9:37 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> user to bcc, +dev
>
> Cheolsoo,
>
> Can you make a JIRA for this? I can imagine a slightly heavier test suite,
> but I like where you started. If it's not far off, then I think it'll be a
> win to make it thread safe. But we need to make sure to test the most
> advanced features...UDF's (esp the same name but different udf in different
> invocations), scripting UDFs (same thing), and so on.
>
>
> 2013/1/25 Cheolsoo Park <[EMAIL PROTECTED]>
>
> > >> if you have multiple threads that run a query via PigServer, there is
> a
> > great chance of the internals clashing because of the use of static
> > variable within Pig.
> >
> > Recently, I spent some time on this, and what I found is that the Pig
> > front-end is quite thread-safe. Here is how I tested it:
> >
> > 1) Wrote a PigUnit test that runs in MR mode.
> > 2) Executed test cases concurrently in 4 threads using a JUnit extension
> > called temps-fugit:
> > http://tempusfugitlibrary.org/documentation/junit/parallel/
> >
> > After fixing PIG-3096, I was able to successfully run Pig queries in
> > parallel. It's important to note that only the front-end needs to be
> > thread-safe since that's what is executed in parallel.
> >
> > I arbitrarily selected queries from e2e test cases, so they are probably
> > not complex enough to mimic real-world examples. Nevertheless, my test
> > program ran without a problem for few days. I couldn't continue my
> > experiment because I was pulled out into something else. However, I think
> > that making the front-end thread-safe is an achievable goal.
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> > On Thu, Jan 24, 2013 at 11:18 PM, Ramakrishna Nalam
> > <[EMAIL PROTECTED]>wrote:
> >
> > > That clarifies it for me, thanks a lot.
> > >
> > > Regards,
> > > Rama.
> > >
> > >
> > > On Fri, Jan 25, 2013 at 10:09 AM, Jonathan Coveney <[EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Well, when I say that Pig is not multi-threaded, what I mean is that
> if
> > > you
> > > > have multiple threads that run a query via PigServer, there is a
> great
> > > > chance of the internals clashing because of the use of static
> variables
> > > > within Pig. Pig itself, when running a single query, is
> multi-threaded.
> > > > It's just not "multi-threaded" in the sense that multiple instances
> can
> > > > safely be run in the same JVM.
> > > >
> > > >
> > > > 2013/1/24 Ramakrishna Nalam <[EMAIL PROTECTED]>
> > > >
> > > > > Hi Jonathan,
> > > > >
> > > > > Pardon if it's a naive question, but Interesting that you say Pig
> is
> > > not
> > > > > multithreaded.
> > > > > We're using Pig 0.10.0, and looking at the code, it seems to do the
> > > right
> > > > > things to handle multi threaded requests (ThreadLocal for
> ScriptState
> > > for
> > > > > eg).
> > > > >
> > > > > Would be great if you can point out to the kind of issues there
> could
> > > be.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Rama.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jan 24, 2013 at 8:32 PM, Praveen M <
> [EMAIL PROTECTED]>
> > > > > wrote:
> > > > >
> > > > > > Are there any plans on making the pigserver multi-threaded?
> > > > > >
> > > > > > since there is "PigProcessNotificationListener" to subscribe for
> > > async
> > > > > > callbacks when the pig job completes, is there any real need to
> > keep
> > > > the
> > > > > > pig job submitting thread waiting until the job completes?