Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)


+
Neil Yalowitz 2012-05-16, 00:28
+
Norbert Burger 2012-05-16, 00:34
+
Neil Yalowitz 2012-05-16, 03:39
Copy link to this message
-
Re: Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)
Great - glad to hear you're up and running.

On Tue, May 15, 2012 at 11:39 PM, Neil Yalowitz <[EMAIL PROTECTED]>wrote:

> > Is your HBase conf dir part of your Hadoop classpath? HBase configuration
> > settings are not pushed down to the mapreduce task level by default
>
> This was the problem.  I was setting the classpath on the machine where the
> Pig query was being executed but not on the MR cluster nodes which were
> executing the job.
>
> The MR cluster in this case is managed by Cloudera's cluster tool (Cloudera
> Manager) which re-generates the conf files upon service restart.  To
> configure the correct target Zookeeper cluster-wide it required adding the
> following to a specific override field in Cloudera Manager under the Mapred
> service (the "Mapreduce Service Configuration Safety Valve" field) and then
> restart the MR service:
>
> <property>
>  <name>hbase.zookeeper.quorum</name>
>  <value>myzookeeper1</value>
> </property>
>
>
> Thanks Norbert, that was the exact tip I needed.
>
>
> Neil Yalowitz
> [EMAIL PROTECTED]
>
> 2012-05-15 22:41:06,157 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.class.path=/etc/hbase/conf....(etc)...
>
>
>
>
> On Tue, May 15, 2012 at 8:34 PM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Is your HBase conf dir part of your Hadoop classpath?  HBase
> configuration
> > settings are not pushed down to the mapreduce task level by default:
> >
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath
> >
> > Norbert
> >
> > On Tue, May 15, 2012 at 8:28 PM, Neil Yalowitz <[EMAIL PROTECTED]
> > >wrote:
> >
> > > I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
> > > attempting to query the contents with Pig (version 0.8.1-cdh3u3).
> > >
> > >
> > > grunt> A = load 'test' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
> > > grunt> dump A;
> > > (...)Success!
> > > myhbasevalue1
> > >
> > >
> > > This works when pig runs in local mode, but when it is executed in
> > > mapreduce mode, the MR job fails with an all-too-familiar error
> message:
> > >
> > >
> > >    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able
> to
> > > connect to ZooKeeper but the connection closes immediately
> > >
> > >
> > > To make this work with pig + local mode, I followed suggestions I found
> > via
> > > a web search and added the HBase classpath to PIG_CLASSPATH:
> > >
> > >
> > > added to:  /usr/lib/pig/bin/pig
> > >
> > > export JAVA_HOME=/usr/java/latest
> > > export HBASE_HOME=/usr/lib/hbase
> > > export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase
> classpath`:$PIG_CLASSPATH"
> > >
> > >
> > > added to: /etc/hbase/conf/hbase-site.xml
> > >
> > > <property>
> > >  <name>hbase.zookeeper.quorum</name>
> > >  <value>myzookeeper1</value>
> > > </property>
> > >
> > >
> > > So again, this works with pig in local mode.  To make my job run in
> > > mapreduce mode, I add a target HDFS and Jobtracker service to the pig
> > > properties
> > >
> > >
> > > added to: /etc/pig/conf/pig.properties
> > >
> > > fs.default.name=hdfs://my-mr-cluster/
> > > mapred.job.tracker=my-mr-cluster:8021
> > >
> > >
> > > When I run the query again on the actual MR cluster, the job fails with
> > the
> > > Zookeeper exception I mentioned above.
> > >
> > > When I examine the job.xml (in the MR dashboard as well in the
> temporary
> > > taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
> > > (myzookeeper1).  However, when I arbitrarily select a Tasktracker node
> > and
> > > examine the TT logs, I see that the Tasktracker thinks the ZK is
> > > "localhost".
> > >
> > > Any ideas?  This is mindbending.
> > >
> > >
> > > Neil Yalowitz
> > > [EMAIL PROTECTED]
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB