Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)


+
Neil Yalowitz 2012-05-16, 00:28
Copy link to this message
-
Re: Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)
Norbert Burger 2012-05-16, 00:34
Is your HBase conf dir part of your Hadoop classpath?  HBase configuration
settings are not pushed down to the mapreduce task level by default:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Norbert

On Tue, May 15, 2012 at 8:28 PM, Neil Yalowitz <[EMAIL PROTECTED]>wrote:

> I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
> attempting to query the contents with Pig (version 0.8.1-cdh3u3).
>
>
> grunt> A = load 'test' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
> grunt> dump A;
> (...)Success!
> myhbasevalue1
>
>
> This works when pig runs in local mode, but when it is executed in
> mapreduce mode, the MR job fails with an all-too-familiar error message:
>
>
>    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
> connect to ZooKeeper but the connection closes immediately
>
>
> To make this work with pig + local mode, I followed suggestions I found via
> a web search and added the HBase classpath to PIG_CLASSPATH:
>
>
> added to:  /usr/lib/pig/bin/pig
>
> export JAVA_HOME=/usr/java/latest
> export HBASE_HOME=/usr/lib/hbase
> export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
>
>
> added to: /etc/hbase/conf/hbase-site.xml
>
> <property>
>  <name>hbase.zookeeper.quorum</name>
>  <value>myzookeeper1</value>
> </property>
>
>
> So again, this works with pig in local mode.  To make my job run in
> mapreduce mode, I add a target HDFS and Jobtracker service to the pig
> properties
>
>
> added to: /etc/pig/conf/pig.properties
>
> fs.default.name=hdfs://my-mr-cluster/
> mapred.job.tracker=my-mr-cluster:8021
>
>
> When I run the query again on the actual MR cluster, the job fails with the
> Zookeeper exception I mentioned above.
>
> When I examine the job.xml (in the MR dashboard as well in the temporary
> taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
> (myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
> examine the TT logs, I see that the Tasktracker thinks the ZK is
> "localhost".
>
> Any ideas?  This is mindbending.
>
>
> Neil Yalowitz
> [EMAIL PROTECTED]
>
+
Neil Yalowitz 2012-05-16, 03:39
+
Norbert Burger 2012-05-16, 04:34