El 02/10/2012 11:32, Greg Ross escribiï¿½:
> Thanks for the suggestions.
> I was attempting to tune the GC via mapred.child.java.opts in the job's
> Oozie config instead of in hbase-env.sh. I think this is why my efforts
> were to no avail. It was likely having no effect on the read/write
> performance. Is there any way of specifying job-specific HBase parameters
> instead of globally setting them in hbase-env.sh?
> The cluster has 175 nodes. Each with 48GB of RAM. The overall data input
> size is 7TB and I pre-split the table into initially 30 regions, then 100
> in another attempt. Each job runs upon 700GB chunks of the data. I used
> RegionSplitter to create and condition the table and therefore there's
> currently no compression. I'm thinking to recreate the table and 'alter' it
> with LZO compression before attempting the jobs again.
There are many points that you can do for HBase performance tuning.
In the Lars Georgeï¿½s book "HBase: The Definitive Guide", the Chapter 11
to this tricky topic, and in the HBase book, there are good points too:
Thanks to Doug for the link.
> On Tue, Oct 2, 2012 at 7:20 AM, Damien Hardy <[EMAIL PROTECTED]> wrote:
>> 2012/10/2 Marcos Ortiz <[EMAIL PROTECTED]>
>>> Another thing that Iï¿½m seeing is that one of your main process is
>>> so you can optimize all this inceasing the size of your regions (by
>>> defaulf the size of a
>>> region is 256 MB), but you will have in your hands a "split/compaction
>>> storm" like
>>> Lars called them on his book.
>> Actually it seams like the default value for hbase.hregion.max.filesize in
>> 0.92 was increased up to 1Go.
>> But you can set it to more (max is 20Go) and split manually.
Marcos Ortiz Valmaseda,
Data Engineer && Senior System Administrator at UCI
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION