Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: long garbage collecting pause

Copy link to this message
Re: long garbage collecting pause
Marcos Ortiz 2012-10-02, 13:38

El 01/10/2012 16:35, Greg Ross escribió:
> Hi,
> I'm having difficulty with a mapreduce job that has reducers that read
> from and write to HBase, version 0.92.1, r1298924. Row sizes vary
> greatly. As do the number of cells, although the number of cells is
> typically numbered in the tens, at most. The max cell size is 1MB.
0.94.1 is out with a lot of improvements related to performance. It
would better
if you use tis version.
> I see the following in the logs followed by the region server promptly
> shutting down:
> 2012-10-01 19:08:47,858 [regionserver60020] WARN
> org.apache.hadoop.hbase.util.Sleeper: We slept 28970ms instead of
> 3000ms, this is likely due to a long garbage collecting pause and it's
> usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> The full logs, including GC are below.
> Although new to HBase, I've read up on the likely GC issues and their
> remedies. I've implemented the recommended solutions and still to no
> avail.
> Here's what I've tried:
> (1) increased the RAM to 4G
Which is the exact size of your RAM?
> (2) set -XX:+UseConcMarkSweepGC
> (3) set -XX:+UseParNewGC
> (4) set -XX:CMSInitiatingOccupancyFraction=N where I've attempted N=[40..70]
> (5) I've called context.progress() in the reducer before and after
> reading and writing
> (6) memstore is enabled
> Is there anything else that I might have missed?
> Thanks,
> Greg
I´m seeing in the HBase logs that there a lot of block requests that are
Can you send to us the ouput of JPS?
Can you check the filesystem´s health with hadoop fsck?

Another thing that I´m seeing is that one of your main process is
so you can optimize all this inceasing the size of your regions (by
defaulf the size of a
region is 256 MB), but you will have in your hands a "split/compaction
storm" like
Lars called them on his book.

Instead using the default mechanism for region spliting and compaction,
you can turn it
off and do it manually with the split and major_compaction commands.

You can evaluate to use compresion in your cluster to save a lot of
space in your region servers.

Which is the size of your cluster?
You can use SPM, Ganglia or OpenTSDB to monitor constantly your cluster.

Best wishes
> hbase logs
> =======>
> 2012-10-01 19:09:48,293
> [regionserver60020-largeCompactions-1348577979539] INFO
> org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at
> hdfs://namenode301.ngpipes.milp.ngmoco.com:9000/hbase/orwell_events/a9906c96a91bb8d7e62a7a528bf0ea5c/.tmp/d2ee47650b224189b0c27d1c20929c03
> to hdfs://namenode301.ngpipes.milp.ngmoco.com:9000/hbase/orwell_events/a9906c96a91bb8d7e62a7a528bf0ea5c/U/d2ee47650b224189b0c27d1c20929c03
> 2012-10-01 19:09:48,884
> [regionserver60020-largeCompactions-1348577979539] INFO
> org.apache.hadoop.hbase.regionserver.Store: Completed major compaction
> of 5 file(s) in U of
> orwell_events,00321084,1349118541283.a9906c96a91bb8d7e62a7a528bf0ea5c.
> into d2ee47650b224189b0c27d1c20929c03, size=723.0m; total size for
> store is 723.0m
> 2012-10-01 19:09:48,884
> [regionserver60020-largeCompactions-1348577979539] INFO
> org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest:
> completed compaction:
> regionName=orwell_events,00321084,1349118541283.a9906c96a91bb8d7e62a7a528bf0ea5c.,
> storeName=U, fileCount=5, fileSize=1.4g, priority=2,
> time=10631266687564968; duration=35sec
> 2012-10-01 19:09:48,886
> [regionserver60020-largeCompactions-1348577979539] INFO
> org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on U
> in region orwell_events,00316914,1349118541283.9740f22a42e9e8b6aca3966c0173e680.
> 2012-10-01 19:09:48,887
> [regionserver60020-largeCompactions-1348577979539] INFO
> org.apache.hadoop.hbase.regionserver.Store: Starting compaction of 5
> file(s) in U of
> orwell_events,00316914,1349118541283.9740f22a42e9e8b6aca3966c0173e680.
> into tmpdir=hdfs://namenode301.ngpipes.milp.ngmoco.com:9000/hbase/orwell_events/9740f22a42e9e8b6aca3966c0173e680/.tmp,
Marcos Ortiz Valmaseda,
Data Engineer && Senior System Administrator at UCI
Blog: http://marcosluis2186.posterous.com
Linkedin: http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186


Damien Hardy 2012-10-02, 14:20
Greg Ross 2012-10-02, 15:32
Marcos Ortiz 2012-10-02, 17:37
Michael Segel 2012-10-02, 14:23