Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Pig 0.7  Release Engineering and how to drop 18 MB out of the distro size


Copy link to this message
-
Re: Pig 0.7 Release Engineering and how to drop 18 MB out of the distro size
+1

This wil greatly simplify (or rather say enable) the use of Pig from within
other systems (like Oozie) as it will allow to do a proper component
dependency resolution.

Thanks.

Alejandro

On Thu, Dec 9, 2010 at 3:37 AM, Stephen Watt <[EMAIL PROTECTED]> wrote:

> Hi Folks
>
> I've been doing some release engineering around Pig 0.7 and thought I
> would share this in case any of you have it baked into a distribution.
> Using the current techniques you can drop the current distro from 44MB to
> a runtime only distro of 26MB. Also, if I've missed something or anything
> I'm suggesting here has any negative ramifications I'd love to know.
>
> 1) Delete everything out of lib directory and copy the following files
> into the lib directory commons-el.jar  commons-httpclient-3.0.1.jar
> commons-logging-1.0.4.jar  hadoop-0.20.2-core.jar  hbase-0.20.6.jar
> hbase-0.20.6-test.jar  jline-0.9.94.jar  log4j-1.2.15.jar
> 2) Delete the Pig Jars in $PIG_HOME except pig-0.7.1-dev-core.jar and copy
> it into the lib directory
> 3) Add the following to bin/pig so that grunt still works:
>
> for f in $PIG_DIR/lib/*.jar; do
>    CLASSPATH=${CLASSPATH}:$f;
> done
>
> Lastly, some observations
>
> - According to its JIRA ticket, automaton.jar is part of Pig 0.8, what is
> the jar doing in Pig 0.7?
>
> - Those that ship Pig need to do Legal scans on the software to ensure all
> the dependencies (jars in the lib folder) have friendly licenses and can
> be shipped along with the base project. Creating files like Hadoop20.jar,
> where Hadoop and all of its dependencies + a bunch of classes of
> undetermined origin are all compiled into a single jar makes this
> extremely difficult. I'd like to bring it up for consideration that in
> future releases we could have an independent jar for each project in the
> lib. Otherwise, for each class we have to figure out what the project is
> (to determine its license) and what version it is based on the package
> name and date of the classes.
>
> Regards
> Steve Watt
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB