Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)


Copy link to this message
-
Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)
I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
attempting to query the contents with Pig (version 0.8.1-cdh3u3).
grunt> A = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
grunt> dump A;
(...)Success!
myhbasevalue1
This works when pig runs in local mode, but when it is executed in
mapreduce mode, the MR job fails with an all-too-familiar error message:
    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
connect to ZooKeeper but the connection closes immediately
To make this work with pig + local mode, I followed suggestions I found via
a web search and added the HBase classpath to PIG_CLASSPATH:
added to:  /usr/lib/pig/bin/pig

export JAVA_HOME=/usr/java/latest
export HBASE_HOME=/usr/lib/hbase
export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
added to: /etc/hbase/conf/hbase-site.xml

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>myzookeeper1</value>
</property>
So again, this works with pig in local mode.  To make my job run in
mapreduce mode, I add a target HDFS and Jobtracker service to the pig
properties
added to: /etc/pig/conf/pig.properties

fs.default.name=hdfs://my-mr-cluster/
mapred.job.tracker=my-mr-cluster:8021
When I run the query again on the actual MR cluster, the job fails with the
Zookeeper exception I mentioned above.

When I examine the job.xml (in the MR dashboard as well in the temporary
taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
(myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
examine the TT logs, I see that the Tasktracker thinks the ZK is
"localhost".

Any ideas?  This is mindbending.
Neil Yalowitz
[EMAIL PROTECTED]