Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)


Copy link to this message
-
Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)
I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
attempting to query the contents with Pig (version 0.8.1-cdh3u3).
grunt> A = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
grunt> dump A;
(...)Success!
myhbasevalue1
This works when pig runs in local mode, but when it is executed in
mapreduce mode, the MR job fails with an all-too-familiar error message:
    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
connect to ZooKeeper but the connection closes immediately
To make this work with pig + local mode, I followed suggestions I found via
a web search and added the HBase classpath to PIG_CLASSPATH:
added to:  /usr/lib/pig/bin/pig

export JAVA_HOME=/usr/java/latest
export HBASE_HOME=/usr/lib/hbase
export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
added to: /etc/hbase/conf/hbase-site.xml

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>myzookeeper1</value>
</property>
So again, this works with pig in local mode.  To make my job run in
mapreduce mode, I add a target HDFS and Jobtracker service to the pig
properties
added to: /etc/pig/conf/pig.properties

fs.default.name=hdfs://my-mr-cluster/
mapred.job.tracker=my-mr-cluster:8021
When I run the query again on the actual MR cluster, the job fails with the
Zookeeper exception I mentioned above.

When I examine the job.xml (in the MR dashboard as well in the temporary
taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
(myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
examine the TT logs, I see that the Tasktracker thinks the ZK is
"localhost".

Any ideas?  This is mindbending.
Neil Yalowitz
[EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB