Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase issues since upgrade from 0.92.4 to 0.94.6


Copy link to this message
-
Re: HBase issues since upgrade from 0.92.4 to 0.94.6
David Koch 2013-07-12, 16:15
Hello,

This is the command that is used to launch the region servers:

/usr/java/jdk1.7.0_25/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m
-Djava.net.preferIPv4Stack=true -Xmx1073741824 -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
-Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-cmf-hbase1-REGIONSERVER-big-4.ezakus.net.log.out
-Dhbase.home.dir=/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hbase
-Dhbase.id.str= -Dhbase.root.logger=INFO,RFA -Djava.library.path=<... libs
...>

so garbage collection logging is not activated it seems. I can try and
re-launch with the -verbose:gc flag

All HBase settings are left at their (CDH 4.3) default for example:
hfile.block.cache.size=0.25
hbase.hregion.max.filesize=1GB

except:
hbase.hregion.majorcompaction=0

speculative execution is off.

The only solution we have found so far is lowering the workload by running
less jobs in parallel.

/David
On Fri, Jul 12, 2013 at 1:48 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:

> I do think your JVM on the RS crashed. do you have GC log?
>
> do you set MR *mapred*.map.tasks.*speculative.execution=false *when you
> using map jobs to read or write HBASE?
>
> and if you have a heavy read/write load, how did you tune the HBase? such
> as block cache size, compaction, memstore etc.
>
>
> On Fri, Jul 12, 2013 at 7:42 PM, David Koch <[EMAIL PROTECTED]> wrote:
>
> > Thank you for your responses. With respect to the version of Java I found
> > that Cloudera recommend<
> >
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Requirements-and-Supported-Versions/cdhrsv_topic_3.html
> > >1.7.x
> > for CDH4.3.
> >
> >
> > On Fri, Jul 12, 2013 at 1:32 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Might want to run memtest also, just to be sure there is no memory
> issue.
> > > It should not since it was working fine with 0.92.4, but costs
> nothing...
> > >
> > > the last version of Java 6 is 45... Might also worst to give it a try
> if
> > > you are running with 1.6.
> > >
> > > 2013/7/12 Asaf Mesika <[EMAIL PROTECTED]>
> > >
> > > > You need to see the jvm crash in .out log file and see if maybe its
> the
> > > .so
> > > > native Hadoop code that making the problem. In our case we
> > > > Downgraded from jvm 1.6.0-37 to 33 and it solved the issue.
> > > >
> > > >
> > > > On Friday, July 12, 2013, David Koch wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > NOTE: I posted the same message in the the Cloudera group.
> > > > >
> > > > > Since upgrading from CDH 4.0.1 (HBase 0.92.4) to 4.3.0 (HBase
> 0.94.6)
> > > we
> > > > > systematically experience problems with region servers crashing
> > > silently
> > > > > under workloads which used to pass without problems. More
> > specifically,
> > > > we
> > > > > run about 30 Mapper jobs in parallel which read from HDFS and
> insert
> > in
> > > > > HBase.
> > > > >
> > > > > region server log
> > > > > NOTE: no trace of crash, but server is down and shows up as such in
> > > > > Cloudera Manager.
> > > > >
> > > > > 2013-07-12 10:22:12,050 WARN
> > > > > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: File
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286
> > > > > might be still open, length is 0
> > > > > 2013-07-12 10:22:12,051 INFO
> > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > Recovering file
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX
> > > > > t%2C60020%2C1373616547696.1373617004286
> > > > > 2013-07-12 10:22:13,064 INFO
> > org.apache.hadoop.hbase.util.FSHDFSUtils:
> > > > > Finished lease recover attempt for
> > > > >
> > > > >
> > > >
> > >
> >
> hdfs://XXXXXXX:8020/hbase/.logs/XXXXXXX,60020,1373616547696-splitting/XXXXXXX%2C60020%2C1373616547696.1373617004286