|
|
-
Re: Managing pig script jar dependenciesAlejandro Abdelnur 2011-01-21, 07:23
In Oozie we run into a similar problem.
As workflows with pig actions proliferate the lib/ directory of each workflow app had to contain Pig and dependent JARs. This becomes a nightmare as to maintain as workflow app increase. The approach to solve this was to add to oozie the concept of a sharelib/ directory in HDFS. Then copy to the sharelib/ all the JARs you want to use across multiple workflow applications. When submitting a workflow you can specify the sharelib/ dir you want to use or you can indicate Oozie to use the system sharelib/ (the default one). Oozie then adds to the distributed cache for the for Pig job all the JARs in the specified sharelib/ The benefits of this approach is that JAR files are only once in HDFS and they can be managed and updated globally. And users won't miss a JAR by mistake. This feature is coming in Oozie 2.3 Pig could easily have a -sharelib option that points to an HDFS sharelib/ directory thus achieving the same. <ad> BTW, as Oozie supports submitting pig jobs over Oozie, doing 'oozie pig -f ....' you can get the feature for free, plus that Oozie becomes a Pig server (you get a job ID and you track progress later), all this without having to write a workflow. </ad> Hope this helps. Alejandro On Fri, Jan 21, 2011 at 2:44 PM, Erik Onnen <[EMAIL PROTECTED]> wrote: > As a new member to the list, I offer our lone data point. We use the maven > shade plugin: http://maven.apache.org/plugins/maven-shade-plugin/ > > Shade produces an "uber" JAR with an optional declared main class. > > <http://maven.apache.org/plugins/maven-shade-plugin/>On the up side, for a > reasonable number of dependencies (in our case ~40), it just works and > results in a single JAR. We're lucky enough that across the board, we can > use one JAR for launching a message consumer, an Hadoop Job, and a Pig job. > > <http://maven.apache.org/plugins/maven-shade-plugin/>That said, there are > two caveats we've encountered: > * System dependencies aren't rolled into the "uber" JAR - if you want > something to be in the deployment artifact, you need to at a minimum put it > into your local repo - we do this via bash scripting for HBase 0.90.0 for > example. > * Conflicts - so far we've managed to do a maven dependency:tree and > exclude > conflicting dependencies, but I'm sure there is a point where that will not > work any more. > > I'd love to hear how others are solving the problem, so far this has worked > for us. > > -erik > > > On Thu, Jan 20, 2011 at 7:31 PM, Kaluskar, Sanjay < > [EMAIL PROTECTED] > > wrote: > > > Hi Dmitriy, > > > > Well, what I have is still experimental & not in any product. But, yes > > we can compile to a Pig script. I try to use the native relational > > operators where possible & use UDFs in other cases. > > > > I don't understand which conflicts you are referring to. Initially, I > > was trying to create a single jar (containing all the 300 dependencies) > > using the maven-dependency-plugin (BTW that seems to be the recommended > > approach & should work in many cases) but it turned out that some of our > > internal components had conflicting file names for some of the resources > > (should probably be fixed!). My current approach works better because I > > don't try to re-package any dependency. Yes, startup times are slow - of > > course, I am open to other ideas :-) > > > > -----Original Message----- > > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] > > Sent: 21 January 2011 07:57 > > To: [EMAIL PROTECTED] > > Subject: Re: Managing pig script jar dependencies > > > > Sanjay, > > Informatica compiles to Pig now, eh? Interesting... > > How do you handle jar conflicts if you bundle the whole lot? Doesn't > > this cost you a lot on job startup time? > > > > Dmitriy > > > > > > On Thu, Jan 20, 2011 at 5:41 PM, Kaluskar, Sanjay > > <[EMAIL PROTECTED] > > > wrote: > > > > > I have a similar problem and I can tell you what I am doing currently, > > > > > just in case it is useful. I have a tool that generates PIG scripts |