Thanks, Lars !
In my case, the amount of data on disk is a lot lower so I can do with
fewer regions. Neverthless, even if i set the flush cache too large - the
memstore lowerLimit and memstore upperLimit will cause flushes before we
need a lot of heap to support all the memstores. But then probably I will
get flushes before reaching the 600M limit.
I just found out that a 128M memstore for me gives an 8M sized hfile which
is tiny (the file is fast_diff encoded) which to me, sounds tiny in size.
So I felt that I should increase the flush size since the output files will
be anyways small in size. This would help reduce compaction activity. But
then yes, to your above comment, I with a 600M flush size and 3G to all
memstores, I can probably support around 5-10 regions per server.
Otherwise, I will hit the 3G ceiling too soon and memstore flushes will
happen far before reaching the 600M limit.
On Fri, Jan 18, 2013 at 4:45 AM, Chalcy Raja
> Looking forward to the blog!
> -----Original Message-----
> From: lars hofhansl [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 17, 2013 9:24 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Hbase heap size
> You'll need more memory then, or more machines with not much disk
> You can look at it this way:
> - The largest useful region size is 20G (at least that is the current
> common tribal knowledge).
> - Each region has at least one memstore (one per column family actually,
> let's just say one for the sake of argument).
> If you have 10T disks per region server then you need ~170 regions per
> region server (3*20G*170 ~ 10T).
> If you give the memstore 35% of your heap and have 128M memstores you
> would need 170*128M/0.35 G ~ 60G of heap. That's already too large.
> If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap
> (if all memstores are being written to simultaneously).
> There are ways to address that.
> If you expect that not all memstores are written to at the same time, you
> can leave them smaller and increase their size multipliers, which allows
> them to be temporarily larger.
> Again, this is just back of the envelope.
> This is a lengthy topic, I'm planning a blog post around this. There are a
> bunch or parameters that can be tweaked based on workload.
> The main take away for HBase is that you have to match disk space with
> Java heap.
> -- Lars
> From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Thursday, January 17, 2013 3:24 PM
> Subject: Re: Hbase heap size
> Thanks for the info. I am looking for a balance where I have a write heavy
> work load and need excellent read latency. So 40 % to block cache for
> caching, 35 % to memstore.
> But I would like to also reduce the number of HFiles and amount of
> compaction activity. So, having few number of regions and much larger
> memstore flush size - like 640M. Could a large memstore flush be a problem
> in some sense ? Are updates blocked on memstore flush ? In my case, I would
> expect a 600M sized memstore to materialize into a 200-300M sized HFile.
> On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > A good rule of thumb that I found is to give each region server a Java
> > help that is roughly 1/100th of the size of the disk space per region
> > server.
> > (that is assuming all the default setting: 10G regions, 128M
> > memstores, 40% of heap for memstores, 20% of heap for block cache,
> > 3-way replication)
> > That is, if you give the region server a 10G heap, you can expect to
> > be able to serve about 1T worth of disk space.
> > That can be tweaked of course (increase the region size to 20G, if
> > your load is mostly readonly you shrink the memstores, etc).
> > That way you can reduce that ratio to 1/200 or even less.
> > I'm sure other folks will have more detailed input.