Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> HBaseStorage not working


Copy link to this message
-
Re: HBaseStorage not working
This is a config/classpath issue, no?  At the lowest level, Hadoop MR tasks
don't pick up settings from the HBase conf directory unless they're
explicitly added to the classpath, usually via hadoop/conf/hadoop-env.sh:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Perhaps the classpath that's being added to your Java jobs is slightly
different?

Norbert

On Wed, May 2, 2012 at 6:48 AM, Royston Sellman <
[EMAIL PROTECTED]> wrote:

> Hi,
>
> We are still experiencing 40-60 minutes of task failure before our
> HBaseStorage jobs run but we think we've narrowed the problem down to a
> specific zookeeper issue.
>
> The HBaseStorage map task only works when it lands on a machine that
> actually is running zookeeper server as part of the quorum. It typically
> attempts from several different nodes in the cluster, failing repeatedly
> before it hits on a zookeeper node.
>
> Logs show the failing task attempts are trying to connect to the localhost
> machine on port 2181 to make a ZooKeeper connection (as part of the
> Load/HBaseStorage map task):
>
> ...
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> ...
> > java.net.ConnectException: Connection refused
> ...
>
> This explains why the job succeeds eventually, as we have a zookeeper
> quorum
> server running on one of our worker nodes, but not on the other 3.
> Therefore, the job fails repeatedly until it is redistributed onto the node
> with the ZK server, at which point it succeeds immediately.
>
> We therefore suspect the issue is in our ZK configuration. Our
> hbase-site.xml defines the zookeeper quorum as follows:
>
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>namenode,jobtracker,slave0</value>
>    </property>
>
> Therefore, we would expect the tasks to connect to one of those hosts when
> attempting a zookeeper connection, however it appears to be attempting to
> connect to "localhost" (which is the default). It is as if the hbase
> configuration settings here are not used.
>
> Does anyone have any suggestions as to what might be the cause of this
> behaviour?
>
> Sending this to both lists although it is only Pig HBaseStorage jobs that
> suffer this problem on our cluster. HBase Java client jobs work normally.
>
> Thanks,
> Royston
>
> -----Original Message-----
> From: Subir S [mailto:[EMAIL PROTECTED]]
> Sent: 24 April 2012 13:29
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: HBaseStorage not working
>
> Looping HBase group.
>
> On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
> [EMAIL PROTECTED]> wrote:
>
> > We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
> >
> > The script below runs fine in a few seconds using Pig in local mode
> > but with Pig in MR mode it sometimes works rapidly but usually takes
> > 40 minutes to an hour.
> >
> > --hbaseuploadtest.pig
> > register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> > register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ','
> > ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray,
> mind :
> > chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE
> > raw_data INTO 'hbase://hbaseuploadtest' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf
> > info:mt info:mind info:mimd info:mst);
> >
> > i.e.
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x local
> > ../pig-scripts/hbaseuploadtest.pig
> > WORKS EVERY TIME!!
> > But
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
> > ../pig-scripts/hbaseuploadtest.pig
> > Sometimes (but rarely) runs in under a minute, often takes more than
> > 40 minutes to get to 50% but then completes to 100% in seconds. The
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB