Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig optimization getting in the way?


Copy link to this message
-
Re: Pig optimization getting in the way?
I hope that's the case. But

 *mapred.job.reuse.jvm.num.tasks* 1
However it does seem to be doing the write to two DB tables in the same job
so although it's not re-using jvm, it is already in one jvm since it's the
same task!

And since the DB connection is static/singleton as you mentioned, and table
name (which is the only thing that's different) is not part of connection
URL, they share the same DB connection, and one of them will close the
connection when it's done.

Hmm, any suggestions how we can handle this? Thanks.

On Fri, Feb 18, 2011 at 3:38 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Let me guess -- you have a static JDBC connection that you open in myJDBC,
> and you have jvm reuse turned on.
>
> On Fri, Feb 18, 2011 at 1:41 PM, Dexin Wang <[EMAIL PROTECTED]> wrote:
>
> > I ran into a problem that I have spent quite some time on and start to
> > think
> > it's probably pig's doing something optimization that makes this thing
> > hard.
> >
> > This is my pseudo code:
> >
> > raw = LOAD ...
> >
> > then some crazy stuff like
> > filter
> > join
> > group
> > UDF
> > etc
> >
> > A = the result from above operation
> > STORE A INTO 'dummy' USING myJDBC(write to table1);
> >
> > This works fine and I have 4 map-red jobs.
> >
> > Then I add this after that:
> >
> > B = FILTER A BY col1="xyz";
> > STORE B INTO 'dummy2' USING myJDBC(write to table2);
> >
> > basically I do some filtering of A and write it to another table thru
> JDBC.
> >
> > Then I had the problem of jobs failing and saying "PSQLException: This
> > statement has been closed".
> >
> > My workaround now is to add "EXEC;" before B line and make them write to
> DB
> > in sequence. This works but now it would run the same map-red jobs twice
> -
> > I
> > ended up with 8 jobs.
> >
> > I think the reason for the failure without EXEC line is because pig tries
> > to
> > do the two STORE in the same reducer (or mapper maybe) since B only
> > involves
> > FILTER which doesn't require a separate map-red job and then got
> confused.
> >
> > Is there a way for this to work without having to duplicate the jobs?
> > Thanks
> > a lot!
> >
>