Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase heap size


Copy link to this message
-
Re: Hbase heap size
Thanks, Lars !

In my case, the amount of data on disk is a lot lower so I can do with
fewer regions. Neverthless, even if i set the flush cache too large - the
memstore lowerLimit and memstore upperLimit will cause flushes before we
need a lot of heap to support all the memstores. But then probably I will
get flushes before reaching the 600M limit.

I just found out that a 128M memstore for me gives an 8M sized hfile which
is tiny (the file is fast_diff encoded) which to me, sounds tiny in size.
So I felt that I should increase the flush size since the output files will
be anyways small in size. This would help reduce compaction activity. But
then yes, to your above comment, I with a 600M flush size and 3G to all
memstores, I can probably support around 5-10 regions per server.
Otherwise, I will hit the 3G ceiling too soon and memstore flushes will
happen far before reaching the 600M limit.

On Fri, Jan 18, 2013 at 4:45 AM, Chalcy Raja
<[EMAIL PROTECTED]>wrote:

> Looking forward to the blog!
>
> Thanks,
> Chalcy
>
> -----Original Message-----
> From: lars hofhansl [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, January 17, 2013 9:24 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Hbase heap size
>
> You'll  need more memory then, or more machines with not much disk
> attached.
>
> You can look at it this way:
> - The largest useful region size is 20G (at least that is the current
> common tribal knowledge).
> - Each region has at least one memstore (one per column family actually,
> let's just say one for the sake of argument).
>
> If you have 10T disks per region server then you need ~170 regions per
> region server (3*20G*170 ~ 10T).
> If you give the memstore 35% of your heap and have 128M memstores you
> would need 170*128M/0.35 G ~ 60G of heap. That's already too large.
> If you make the memstores 600M, you'll need 17*600/0.35 G ~ 290G of heap
> (if all memstores are being written to simultaneously).
>
> There are ways to address that.
> If you expect that not all memstores are written to at the same time, you
> can leave them smaller and increase their size multipliers, which allows
> them to be temporarily larger.
>
> Again, this is just back of the envelope.
>
> This is a lengthy topic, I'm planning a blog post around this. There are a
> bunch or parameters that can be tweaked based on workload.
>
> The main take away for HBase is that you have to match disk space with
> Java heap.
>
> -- Lars
>
>
>
> ________________________________
>  From: Varun Sharma <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Thursday, January 17, 2013 3:24 PM
> Subject: Re: Hbase heap size
>
> Thanks for the info. I am looking for a balance where I have a write heavy
> work load and need excellent read latency. So 40 % to block cache for
> caching, 35 % to memstore.
>
> But I would like to also reduce the number of HFiles and amount of
> compaction activity. So, having few number of regions and much larger
> memstore flush size - like 640M. Could a large memstore flush be a problem
> in some sense ? Are updates blocked on memstore flush ? In my case, I would
> expect a 600M sized memstore to materialize into a 200-300M sized HFile.
>
> On Thu, Jan 17, 2013 at 2:31 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > A good rule of thumb that I found is to give each region server a Java
> > help that is roughly 1/100th of the size of the disk space per region
> > server.
> > (that is assuming all the default setting: 10G regions, 128M
> > memstores, 40% of heap for memstores, 20% of heap for block cache,
> > 3-way replication)
> >
> >
> > That is, if you give the region server a 10G heap, you can expect to
> > be able to serve about 1T worth of disk space.
> >
> > That can be tweaked of course (increase the region size to 20G, if
> > your load is mostly readonly you shrink the memstores, etc).
> > That way you can reduce that ratio to 1/200 or even less.
> >
> >
> > I'm sure other folks will have more detailed input.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB