You'll need more memory then, or more machines with not much disk attached.
You can look at it this way:
- The largest useful region size is 20G (at least that is the current common tribal knowledge).
- Each region has at least one memstore (one per column family actually, let's just say one for the sake of argument).
If you have 10T disks per region server then you need ~170 regions per region server (3*20G*170 ~ 10T).
If you give the memstore 35% of your heap and have 128M memstores you would need 170*128M/0.35 G ~ 60G of heap. That's already too large.
If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap (if all memstores are being written to simultaneously).
There are ways to address that.
If you expect that not all memstores are written to at the same time, you can leave them smaller and increase their size multipliers, which allows them to be temporarily larger.
Again, this is just back of the envelope.
This is a lengthy topic, I'm planning a blog post around this. There are a bunch or parameters that can be tweaked based on workload.
The main take away for HBase is that you have to match disk space with Java heap.
From: Varun Sharma <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Thursday, January 17, 2013 3:24 PM
Subject: Re: Hbase heap size
Thanks for the info. I am looking for a balance where I have a write heavy
work load and need excellent read latency. So 40 % to block cache for
caching, 35 % to memstore.
But I would like to also reduce the number of HFiles and amount of
compaction activity. So, having few number of regions and much larger
memstore flush size - like 640M. Could a large memstore flush be a problem
in some sense ? Are updates blocked on memstore flush ? In my case, I would
expect a 600M sized memstore to materialize into a 200-300M sized HFile.
On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> A good rule of thumb that I found is to give each region server a Java
> help that is roughly 1/100th of the size of the disk space per region
> (that is assuming all the default setting: 10G regions, 128M memstores,
> 40% of heap for memstores, 20% of heap for block cache, 3-way replication)
> That is, if you give the region server a 10G heap, you can expect to be
> able to serve about 1T worth of disk space.
> That can be tweaked of course (increase the region size to 20G, if your
> load is mostly readonly you shrink the memstores, etc).
> That way you can reduce that ratio to 1/200 or even less.
> I'm sure other folks will have more detailed input.
> -- Lars
> From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, January 17, 2013 1:15 PM
> Subject: Hbase heap size
> I was wondering how much folks typical give to hbase and how much they
> leave for the file system cache for the region server. I am using hbase
> 0.94 and running only the region server and data node daemons. I have a
> system with 15G ram.