Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Trying to get pig 0.11/0.12 working to solve 0.10's issues with python udf


+
Michał Czerwiński 2012-11-12, 16:47
+
Cheolsoo Park 2012-11-12, 17:37
+
Michał Czerwiński 2012-11-12, 17:59
+
Cheolsoo Park 2012-11-12, 18:09
+
Michał Czerwiński 2012-11-12, 18:29
+
Cheolsoo Park 2012-11-12, 18:45
+
Michał Czerwiński 2012-11-13, 15:16
+
Michał Czerwiński 2012-11-13, 15:40
+
Cheolsoo Park 2012-11-13, 17:18
Copy link to this message
-
Re: Trying to get pig 0.11/0.12 working to solve 0.10's issues with python udf
Yeah, so just to be clear under pig > 0.10 the issue seems to be exactly as
you describe +  issue occurs whenever you specify in the
-Dpig.additional.jars a directory path instead of the file path. This is
quite often happening because its advised on forums to include HIVE_HOME
and HADOOP_HOME in the PIG_CLASSPATH which is then passed to
-Dpig.additional.jars.
I put a comment in the jira ticket.

Thanks again Cheolsoo!

On 13 November 2012 17:18, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Michal,
>
> Thanks for sharing your workaround.
>
> I think that Pig should be able to handle empty file names in
> -Dpig.additional.jars, so users don't have to spend hours to debug problems
> like this. So I filed a JIRA:
> https://issues.apache.org/jira/browse/PIG-3046
>
> We will get this fixed in a future release.
>
> Thanks,
> Cheolsoo
>
> On Tue, Nov 13, 2012 at 7:40 AM, Michał Czerwiński <
> [EMAIL PROTECTED]
> > wrote:
>
> > Oh well I
> > changed
> >
> PIG_CLASSPATH="$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf"
> > into
> > PIG_CLASSPATH="$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar"
> >
> > having still hive libraries loaded via
> > for file in $HIVE_HOME/lib/*.jar; do
> >     #echo "==> Adding $file"
> >     PIG_CLASSPATH="$PIG_CLASSPATH:$file"
> > done
> >
> > and that seems to be working fine now, thanks a lot for help debugging
> it!
> >
> > On 13 November 2012 15:16, Michał Czerwiński <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Right, it looks like that:
> > >
> > > 2012-11-13 15:13:57,100 [main] DEBUG
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > > - Adding jar to DistributedCache:
> > > file:/opt/hcat/share/hcatalog/hcatalog-0.4.0.jar
> > > 2012-11-13 15:13:57,428 [main] DEBUG
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > > - Adding jar to DistributedCache: file:/usr/lib/hive/conf/
> > > 2012-11-13 15:13:57,433 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 2017: Internal error creating job configuration.
> > > Details at logfile: /opt/pig/trunk/pig_1352819617642.log
> > >
> > > #> ls -la /usr/lib/hive/conf/
> > > total 88
> > > drwxr-xr-x 2 root root  4096 2012-11-12 17:48 .
> > > drwxr-xr-x 8 root root  4096 2012-11-09 17:29 ..
> > > -rw-r--r-- 1 root root 39451 2012-11-08 10:24 hive-default.xml
> > > -rw-r--r-- 1 root root  1408 2012-11-08 11:22 hive-env.sh
> > > -rw-r--r-- 1 root root  1410 2012-11-08 10:24 hive-env.sh.template
> > > -rw-r--r-- 1 root root  1637 2012-11-08 10:24
> hive-exec-log4j.properties
> > > -rw-r--r-- 1 root root  2005 2012-11-08 10:24 hive-log4j.properties
> > > -rw-r--r-- 1 root root  4055 2012-11-08 11:22 hive-site-client.xml.tpl
> > > -rw-rw-r-- 1 root root  4879 2012-11-09 15:30 hive-site.xml
> > > -rw-r--r-- 1 root root  4903 2012-11-09 15:30 hive-site.xml.PIG.tpl
> > > -rw-r--r-- 1 root root  3634 2012-11-08 11:22 hive-site.xml.tpl
> > >
> > > On 12 November 2012 18:45, Cheolsoo Park <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> Can you try to print out debug message by adding "-d DEBUG" to the Pig
> > >> command? It will print which additional files are added to distributed
> > >> cache as follows:
> > >>
> > >> 2012-11-12 10:41:58,908 [main] DEBUG
> > >>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > >> - Adding jar to DistributedCache:
> > >> file:/home/cheolsoo/apache-ant-1.8.4/lib/ant-antlr.jar
> > >> 2012-11-12 10:41:59,099 [main] DEBUG
> > >>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > >> - Adding jar to DistributedCache: file:/etc/hadoop-0.20/conf.pseudo/
> > >>
> > >> This will tell you which file it was shipping right before failed.
> That
> > >> will probably give you a hint on where to look into further.
> > >>
> > >> Thanks,
> > >> Cheolsoo
> > >>
> > >>
> > >> On Mon, Nov 12, 2012 at 10:29 AM, Michał Czerwiński <