Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)


Copy link to this message
-
Re: Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)
Is your HBase conf dir part of your Hadoop classpath?  HBase configuration
settings are not pushed down to the mapreduce task level by default:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Norbert

On Tue, May 15, 2012 at 8:28 PM, Neil Yalowitz <[EMAIL PROTECTED]>wrote:

> I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
> attempting to query the contents with Pig (version 0.8.1-cdh3u3).
>
>
> grunt> A = load 'test' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
> grunt> dump A;
> (...)Success!
> myhbasevalue1
>
>
> This works when pig runs in local mode, but when it is executed in
> mapreduce mode, the MR job fails with an all-too-familiar error message:
>
>
>    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
> connect to ZooKeeper but the connection closes immediately
>
>
> To make this work with pig + local mode, I followed suggestions I found via
> a web search and added the HBase classpath to PIG_CLASSPATH:
>
>
> added to:  /usr/lib/pig/bin/pig
>
> export JAVA_HOME=/usr/java/latest
> export HBASE_HOME=/usr/lib/hbase
> export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
>
>
> added to: /etc/hbase/conf/hbase-site.xml
>
> <property>
>  <name>hbase.zookeeper.quorum</name>
>  <value>myzookeeper1</value>
> </property>
>
>
> So again, this works with pig in local mode.  To make my job run in
> mapreduce mode, I add a target HDFS and Jobtracker service to the pig
> properties
>
>
> added to: /etc/pig/conf/pig.properties
>
> fs.default.name=hdfs://my-mr-cluster/
> mapred.job.tracker=my-mr-cluster:8021
>
>
> When I run the query again on the actual MR cluster, the job fails with the
> Zookeeper exception I mentioned above.
>
> When I examine the job.xml (in the MR dashboard as well in the temporary
> taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
> (myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
> examine the TT logs, I see that the Tasktracker thinks the ZK is
> "localhost".
>
> Any ideas?  This is mindbending.
>
>
> Neil Yalowitz
> [EMAIL PROTECTED]
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB