Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Trying to get pig 0.11/0.12 working to solve 0.10's issues with python udf


+
Michał Czerwiński 2012-11-12, 16:47
+
Cheolsoo Park 2012-11-12, 17:37
+
Michał Czerwiński 2012-11-12, 17:59
+
Cheolsoo Park 2012-11-12, 18:09
+
Michał Czerwiński 2012-11-12, 18:29
+
Cheolsoo Park 2012-11-12, 18:45
+
Michał Czerwiński 2012-11-13, 15:16
+
Michał Czerwiński 2012-11-13, 15:40
+
Cheolsoo Park 2012-11-13, 17:18
Copy link to this message
-
Re: Trying to get pig 0.11/0.12 working to solve 0.10's issues with python udf
Yeah, so just to be clear under pig > 0.10 the issue seems to be exactly as
you describe +  issue occurs whenever you specify in the
-Dpig.additional.jars a directory path instead of the file path. This is
quite often happening because its advised on forums to include HIVE_HOME
and HADOOP_HOME in the PIG_CLASSPATH which is then passed to
-Dpig.additional.jars.
I put a comment in the jira ticket.

Thanks again Cheolsoo!

On 13 November 2012 17:18, Cheolsoo Park <[EMAIL PROTECTED]> wrote:

> Hi Michal,
>
> Thanks for sharing your workaround.
>
> I think that Pig should be able to handle empty file names in
> -Dpig.additional.jars, so users don't have to spend hours to debug problems
> like this. So I filed a JIRA:
> https://issues.apache.org/jira/browse/PIG-3046
>
> We will get this fixed in a future release.
>
> Thanks,
> Cheolsoo
>
> On Tue, Nov 13, 2012 at 7:40 AM, Michał Czerwiński <
> [EMAIL PROTECTED]
> > wrote:
>
> > Oh well I
> > changed
> >
> PIG_CLASSPATH="$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf"
> > into
> > PIG_CLASSPATH="$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar"
> >
> > having still hive libraries loaded via
> > for file in $HIVE_HOME/lib/*.jar; do
> >     #echo "==> Adding $file"
> >     PIG_CLASSPATH="$PIG_CLASSPATH:$file"
> > done
> >
> > and that seems to be working fine now, thanks a lot for help debugging
> it!
> >
> > On 13 November 2012 15:16, Michał Czerwiński <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Right, it looks like that:
> > >
> > > 2012-11-13 15:13:57,100 [main] DEBUG
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > > - Adding jar to DistributedCache:
> > > file:/opt/hcat/share/hcatalog/hcatalog-0.4.0.jar
> > > 2012-11-13 15:13:57,428 [main] DEBUG
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > > - Adding jar to DistributedCache: file:/usr/lib/hive/conf/
> > > 2012-11-13 15:13:57,433 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 2017: Internal error creating job configuration.
> > > Details at logfile: /opt/pig/trunk/pig_1352819617642.log
> > >
> > > #> ls -la /usr/lib/hive/conf/
> > > total 88
> > > drwxr-xr-x 2 root root  4096 2012-11-12 17:48 .
> > > drwxr-xr-x 8 root root  4096 2012-11-09 17:29 ..
> > > -rw-r--r-- 1 root root 39451 2012-11-08 10:24 hive-default.xml
> > > -rw-r--r-- 1 root root  1408 2012-11-08 11:22 hive-env.sh
> > > -rw-r--r-- 1 root root  1410 2012-11-08 10:24 hive-env.sh.template
> > > -rw-r--r-- 1 root root  1637 2012-11-08 10:24
> hive-exec-log4j.properties
> > > -rw-r--r-- 1 root root  2005 2012-11-08 10:24 hive-log4j.properties
> > > -rw-r--r-- 1 root root  4055 2012-11-08 11:22 hive-site-client.xml.tpl
> > > -rw-rw-r-- 1 root root  4879 2012-11-09 15:30 hive-site.xml
> > > -rw-r--r-- 1 root root  4903 2012-11-09 15:30 hive-site.xml.PIG.tpl
> > > -rw-r--r-- 1 root root  3634 2012-11-08 11:22 hive-site.xml.tpl
> > >
> > > On 12 November 2012 18:45, Cheolsoo Park <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> Can you try to print out debug message by adding "-d DEBUG" to the Pig
> > >> command? It will print which additional files are added to distributed
> > >> cache as follows:
> > >>
> > >> 2012-11-12 10:41:58,908 [main] DEBUG
> > >>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > >> - Adding jar to DistributedCache:
> > >> file:/home/cheolsoo/apache-ant-1.8.4/lib/ant-antlr.jar
> > >> 2012-11-12 10:41:59,099 [main] DEBUG
> > >>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > >> - Adding jar to DistributedCache: file:/etc/hadoop-0.20/conf.pseudo/
> > >>
> > >> This will tell you which file it was shipping right before failed.
> That
> > >> will probably give you a hint on where to look into further.
> > >>
> > >> Thanks,
> > >> Cheolsoo
> > >>
> > >>
> > >> On Mon, Nov 12, 2012 at 10:29 AM, Michał Czerwiński <
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB