|
|
+
Marcos Ortiz 2012-10-02, 13:38
+
Damien Hardy 2012-10-02, 14:20
+
Greg Ross 2012-10-02, 15:32
-
Re: long garbage collecting pauseMarcos Ortiz 2012-10-02, 17:37
El 02/10/2012 11:32, Greg Ross escribi�: > Thanks for the suggestions. > > I was attempting to tune the GC via mapred.child.java.opts in the job's > Oozie config instead of in hbase-env.sh. I think this is why my efforts > were to no avail. It was likely having no effect on the read/write > performance. Is there any way of specifying job-specific HBase parameters > instead of globally setting them in hbase-env.sh? > > The cluster has 175 nodes. Each with 48GB of RAM. The overall data input > size is 7TB and I pre-split the table into initially 30 regions, then 100 > in another attempt. Each job runs upon 700GB chunks of the data. I used > RegionSplitter to create and condition the table and therefore there's > currently no compression. I'm thinking to recreate the table and 'alter' it > with LZO compression before attempting the jobs again. There are many points that you can do for HBase performance tuning. In the Lars George�s book "HBase: The Definitive Guide", the Chapter 11 is dedicated to this tricky topic, and in the HBase book, there are good points too: http://hbase.apache.org/book.html#perf.reading Thanks to Doug for the link. > > Cheers. > > Greg > > > > On Tue, Oct 2, 2012 at 7:20 AM, Damien Hardy <[EMAIL PROTECTED]> wrote: > >> Hello >> >> 2012/10/2 Marcos Ortiz <[EMAIL PROTECTED]> >> >>> Another thing that I�m seeing is that one of your main process is >>> compaction, >>> so you can optimize all this inceasing the size of your regions (by >>> defaulf the size of a >>> region is 256 MB), but you will have in your hands a "split/compaction >>> storm" like >>> Lars called them on his book. >> >> Actually it seams like the default value for hbase.hregion.max.filesize in >> 0.92 was increased up to 1Go. >> http://hbase.apache.org/book/upgrade0.92.html#d2051e266 >> >> But you can set it to more (max is 20Go) and split manually. >> http://hbase.apache.org/book/important_configurations.html#bigger.regions >> >> Cheers, >> >> -- >> Dam >> > > -- Marcos Ortiz Valmaseda, Data Engineer && Senior System Administrator at UCI Blog: http://marcosluis2186.posterous.com Linkedin: http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci +
Michael Segel 2012-10-02, 14:23
|