|
Zebeljan, Nebojsa
2012-10-10, 12:23
Bill Graham
2012-10-11, 00:59
Cheolsoo Park
2012-10-11, 04:30
Zebeljan, Nebojsa
2012-10-11, 07:29
Zebeljan, Nebojsa
2012-10-11, 07:46
Cheolsoo Park
2012-10-11, 19:06
|
-
Hadoop Job History Loader with PIGZebeljan, Nebojsa 2012-10-10, 12:23
Hi,
I'm using cdh 4.0.1 with pig-0.9.2+26. I'v tried to gather some information about my result files aggregated by pig with the HadoopJobHistoryLoader() as described here http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+History+Loader Running a simple pig script returns "ERROR 1070: Could not resolve org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]" Having this information, I've encountered that a HadoopJobHistoryLoader class in the piggybank does not exist! As by the API, this class should exist http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.html Can someone please lighten me up … Thanks! Regards, Nebo
-
Re: Hadoop Job History Loader with PIGBill Graham 2012-10-11, 00:59
Are you sure you have the piggybank jar in your classpath?
Here's the source FYI, so it certainly exists: http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.java And here it is on the pig 0.9 branch: http://svn.apache.org/repos/asf/pig/branches/branch-0.9/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.java On Wed, Oct 10, 2012 at 5:23 AM, Zebeljan, Nebojsa < [EMAIL PROTECTED]> wrote: > Hi, > I'm using cdh 4.0.1 with pig-0.9.2+26. > > I'v tried to gather some information about my result files aggregated by > pig with the HadoopJobHistoryLoader() as described here > http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+History+Loader > > Running a simple pig script returns "ERROR 1070: Could not resolve > org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports: [, > org.apache.pig.builtin., org.apache.pig.impl.builtin.]" > > Having this information, I've encountered that a HadoopJobHistoryLoader > class in the piggybank does not exist! > > As by the API, this class should exist > http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.html > > Can someone please lighten me up … > > Thanks! > > Regards, > Nebo > > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.*
-
Re: Hadoop Job History Loader with PIGCheolsoo Park 2012-10-11, 04:30
Hi Nebojsa,
Did you register piggybank.jar in your Pig script? REGISTER <path_to_piggibank.jar>; In CDH4.0.1, piggybank.jar can be found at /usr/lib/pig/contrib/piggybank/java/piggybank.jar. Thanks, Cheolsoo On Wed, Oct 10, 2012 at 5:23 AM, Zebeljan, Nebojsa < [EMAIL PROTECTED]> wrote: > Hi, > I'm using cdh 4.0.1 with pig-0.9.2+26. > > I'v tried to gather some information about my result files aggregated by > pig with the HadoopJobHistoryLoader() as described here > http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+History+Loader > > Running a simple pig script returns "ERROR 1070: Could not resolve > org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports: [, > org.apache.pig.builtin., org.apache.pig.impl.builtin.]" > > Having this information, I've encountered that a HadoopJobHistoryLoader > class in the piggybank does not exist! > > As by the API, this class should exist > http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.html > > Can someone please lighten me up … > > Thanks! > > Regards, > Nebo > >
-
Re: Hadoop Job History Loader with PIGZebeljan, Nebojsa 2012-10-11, 07:29
Hi Cheolsoo,
Yes, I've registered the piggybank jar in the pig script - see script below. --- REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar a = load '/some_dir/some_aggregation/_logs/history' using HadoopJobHistoryLoader() as (j:map[], m:map[], r:map[]); b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' as script_name, (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end; c = group b by (id, user, script_name) d = foreach c generate group.user, group.script_name, (MAX(b.end) - MIN(b.start)/1000; dump d; --- I've also downloaded the PIG from cloudera version 4.0.1 again and greped the piggybank.jar for the "HadoopJobHistoryLoader" class - but I'm still not founding the class?! Greped also /usr/lib/pig/contrib/piggybank/java/piggybank.jar - same result Š What I'm doing wrong here? Thanks for any help! Nebo Am 11.10.12 06:30 schrieb "Cheolsoo Park" unter <[EMAIL PROTECTED]>: >Hi Nebojsa, > >Did you register piggybank.jar in your Pig script? > >REGISTER <path_to_piggibank.jar>; > >In CDH4.0.1, piggybank.jar can be found at >/usr/lib/pig/contrib/piggybank/java/piggybank.jar. > >Thanks, >Cheolsoo > >On Wed, Oct 10, 2012 at 5:23 AM, Zebeljan, Nebojsa < >[EMAIL PROTECTED]> wrote: > >> Hi, >> I'm using cdh 4.0.1 with pig-0.9.2+26. >> >> I'v tried to gather some information about my result files aggregated by >> pig with the HadoopJobHistoryLoader() as described here >> >>http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+Histo >>ry+Loader >> >> Running a simple pig script returns "ERROR 1070: Could not resolve >> org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports: >>[, >> org.apache.pig.builtin., org.apache.pig.impl.builtin.]" >> >> Having this information, I've encountered that a HadoopJobHistoryLoader >> class in the piggybank does not exist! >> >> As by the API, this class should exist >> >>http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/Ha >>doopJobHistoryLoader.html >> >> Can someone please lighten me up Š >> >> Thanks! >> >> Regards, >> Nebo >> >>
-
Re: Hadoop Job History Loader with PIGZebeljan, Nebojsa 2012-10-11, 07:46
Hi Cheolsoo,
I've found the reason why the "HadoopJobHistoryLoader" is not available. In clouderas distro the class is excluded when building the piggybank -> ./contrib/piggybank/java/build.xml -> ./cloudera/patches/0001-CLOUDERA-BUILD.-CDHifying-Pig-0.9.1-build.patch --- <!-- JobHistoryLoader currently does not support 0.23 --> <condition property="build.classes.excludes" value="**/HadoopJobHistoryLoader.java" else=""> <equals arg1="${hadoopversion}" arg2="23"/> </condition> <condition property="test.classes.excludes" value="**/TestHadoopJobHistoryLoader.java" else=""> <equals arg1="${hadoopversion}" arg2="23"/> </condition> --- Do you know if this "exclude" is still needed for hadoop-2.x? Thanks in advance! Nebo Am 11.10.12 09:29 schrieb "Zebeljan, Nebojsa" unter <[EMAIL PROTECTED]>: >Hi Cheolsoo, >Yes, I've registered the piggybank jar in the pig script - see script >below. > >--- >REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar > >a = load '/some_dir/some_aggregation/_logs/history' using >HadoopJobHistoryLoader() as (j:map[], m:map[], >r:map[]); >b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, >j#'JOBNAME' as script_name, > (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as end; >c = group b by (id, user, script_name) >d = foreach c generate group.user, group.script_name, (MAX(b.end) - >MIN(b.start)/1000; >dump d; >--- > >I've also downloaded the PIG from cloudera version 4.0.1 again and greped >the piggybank.jar for the "HadoopJobHistoryLoader" class - but I'm still >not founding the class?! > >Greped also /usr/lib/pig/contrib/piggybank/java/piggybank.jar - same >result Š > > >What I'm doing wrong here? > >Thanks for any help! >Nebo > > > >Am 11.10.12 06:30 schrieb "Cheolsoo Park" unter <[EMAIL PROTECTED]>: > >>Hi Nebojsa, >> >>Did you register piggybank.jar in your Pig script? >> >>REGISTER <path_to_piggibank.jar>; >> >>In CDH4.0.1, piggybank.jar can be found at >>/usr/lib/pig/contrib/piggybank/java/piggybank.jar. >> >>Thanks, >>Cheolsoo >> >>On Wed, Oct 10, 2012 at 5:23 AM, Zebeljan, Nebojsa < >>[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> I'm using cdh 4.0.1 with pig-0.9.2+26. >>> >>> I'v tried to gather some information about my result files aggregated >>>by >>> pig with the HadoopJobHistoryLoader() as described here >>> >>>http://archive.cloudera.com/cdh/3/pig/piglatin_ref1.html#Hadoop+Job+Hist >>>o >>>ry+Loader >>> >>> Running a simple pig script returns "ERROR 1070: Could not resolve >>> org.apache.pig.piggybank.storage.HadoopJobHistoryLoader using imports: >>>[, >>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]" >>> >>> Having this information, I've encountered that a HadoopJobHistoryLoader >>> class in the piggybank does not exist! >>> >>> As by the API, this class should exist >>> >>>http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/H >>>a >>>doopJobHistoryLoader.html >>> >>> Can someone please lighten me up Š >>> >>> Thanks! >>> >>> Regards, >>> Nebo >>> >>> >
-
Re: Hadoop Job History Loader with PIGCheolsoo Park 2012-10-11, 19:06
Hi Nebojsa,
You're absolutely right. CDH4.x compiles everything against hadoop-2.0.x, so HadoopJobHistoryLoader is excluded. Thank you very much for pointing that out. This is a packaging bug as I see it, and I am going to get it fixed in next release. In the meantime, could you apply the patch that I added at the end and build piggybank.jar from the source tarball by yourself? 1) wget http://archive.cloudera.com/cdh4/cdh/4/pig-0.9.2-cdh4.0.1.tar.gz 2) tar -xf pig-0.9.2-cdh4.0.1.tar.gz 3) cd pig-0.9.2-cdh4.0.1 4) patch -p0 -i <this patch> 5) ant clean compile-test jar-withouthadoop -Dhadoopversion=23 6) cd contrib/piggybank/java 7) ant clean jar -Dhadoopversion=20 -Dmr1.test=mr1 Now you will find piggybank.jar built in the current directory, and it contains HadoopJobHistoryLoader as follows: 8) jar -tvf piggybank.jar | grep HadoopJobHistoryLoader 1866 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$1.class 1885 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$HadoopJobHistoryInputFormat.class 5769 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$HadoopJobHistoryReader.class 943 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$JobHistoryPathFilter.class 3460 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$JobKeys.class 2681 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$JobXMLHandler.class 751 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader$MRJobInfo.class 16364 Thu Oct 11 11:20:40 PDT 2012 org/apache/pig/piggybank/storage/HadoopJobHistoryLoader.class You can also run the unit test as follows: 9) ant clean test -Dhadoopversion=20 -Dmr1.test=mr1 -Dtestcase=TestHadoopJobHistoryLoader Please let me know if this works for you. Thanks! Cheolsoo diff --git contrib/piggybank/java/build.xml contrib/piggybank/java/build.xml index b162dbd..1616e38 100755 --- contrib/piggybank/java/build.xml +++ contrib/piggybank/java/build.xml @@ -15,7 +15,15 @@ limitations under the License. --> -<project basedir="." default="jar" name="pigudf"> +<project basedir="." default="jar" name="pigudf" + xmlns:artifact="urn:maven-artifact-ant" + xmlns:ivy="antlib:org.apache.ivy.ant"> + <taskdef resource="net/sf/antcontrib/antcontrib.properties"> + <classpath> + <pathelement location="../../../cloudera/maven-packaging/lib/ant-contrib-1.0b3.jar"/> + </classpath> + </taskdef> + <!-- javac properties --> <property name="javac.debug" value="on" /> <property name="javac.level" value="source,lines,vars"/> @@ -39,6 +47,17 @@ <property name="hsqldb.jar" value="../../../build/ivy/lib/Pig/hsqldb-1.8.0.10.jar"/> <property name="ivy.lib.dir" value="../../../build/ivy/lib/Pig"/> + <property name="src.shims.dir" value="../../../shims/src/hadoop${hadoopversion}" /> + <if> + <equals arg1="${mr1.test}" arg2="mr1"/> + <then> + <property name="src.shims.test.dir" value="../../../shims/test/hadoop20" /> + </then> + <else> + <property name="src.shims.test.dir" value="../../../shims/test/hadoop${hadoopversion}" /> + </else> + </if> + <!-- JobHistoryLoader currently does not support 0.23 --> <condition property="build.classes.excludes" value="**/HadoopJobHistoryLoader.java" else=""> <equals arg1="${hadoopversion}" arg2="23"/> @@ -59,14 +78,99 @@ <property name="test.src.dir" value="src/test/java" /> <property name="junit.hadoop.conf" value="${user.home}/pigtest/conf/"/> - <path id="pigudf.classpath"> - <pathelement location="${build.classes}"/> - <pathelement location="${pigjar-withouthadoop}"/> - <pathelement location="${pigtest}"/> - <fileset dir="../../../build/ivy/lib"> - <include name="**/*.jar"/> - </fileset> - </path> + <property name="ivy.dir" location="../../../ivy" /> + <property name="build.ivy.dir" location="${build.dir}/ivy" /> + <property name="build.ivy.lib.dir" location="${build.ivy.dir}/lib" /> + <property name="ivy.lib.dir" location="${build.ivy.lib.dir}/${ ant.project.name}"/> + <property name="build.ivy.report.dir" location="${build.ivy.dir}/report" /> + <property name="build.ivy.maven.dir" location="${build.ivy.dir}/maven" /> + <property name="build.ivy.maven.pom" location="${build.ivy.maven.dir}/pig-${version}.pom" /> + <property name="build.ivy.maven.jar" location="${build.ivy.maven.dir}/pig-${version}-core.jar" /> + + <loadproperties srcfile="${ivy.dir}/libraries.properties"/> + <property name="ivysettings.xml" location="${ivy.dir}/ivysettings.xml" /> + <property name="ivy.jar" location="${ivy.dir}/ivy-${ivy.version}.jar"/> + <property name="mvnrepo" value="http://repo2.maven.org/maven2"/> + <property name="ivy_repo_url" value="${mvnrepo}/org/apache/ivy/ivy/${ivy.version}/ivy-${ivy.version}.jar"/> + + <target name="ivy-init-dirs"> + <mkdir dir="${build.ivy.dir}" /> + <mkdir dir="${build.ivy.lib.dir}" /> + <mkdir dir="${build.ivy.report.dir}" /> + <mkdir dir="${build.ivy.maven.dir}" /> + <copy todir="${basedir}/" file="../../../ivy.xml" /> + </target> + + <target name="ivy-probe-antlib" > + <condition property="ivy.found"> + <typefound uri="antlib:org.apache.ivy.ant" name="cleancache"/> + </condition> + </target> + + <target name="ivy-download" description="To download ivy" unless="offline"> + <get src="${ivy_repo_url}" dest="${ivy.jar}" usetimestamp="true"/> + </target> + + <!-- + To avoid Ivy leaking things across big projects, always load Ivy in the same classloader. + Also note how we skip loading Ivy if it is already there, just to make sure all is well. + --> + <target name="ivy-init |