-Re: High throughput input, low latency output?
Matt Corgan 2011-10-07, 20:15
We found that 2 cores is not enough to run hbase. 1 core can easily get
tied up with a compaction while the other is doing garbage collection. That
doesn't leave any headroom for gets/scans, especially on compressed data
and/or when multiple are happening at the same time. Try to do all of that
at the same time and some of the other background tasks start choking, like
We run the c1.xlarge instances (8 cores, 8gb mem) and everything works well,
though not much room for block cache.
On Fri, Oct 7, 2011 at 12:43 PM, Anthony Urso <[EMAIL PROTECTED]> wrote:
> We have a use case that will require a ten to twenty EC2 node HBase
> cluster to take several hundred million rows of input from a larger
> number of EMR instances in daily bursts, and then serve those rows via
> low latency random reads, say on the order of 300 or so rows per
> second. Before we start coding, I thought it best to ask the experts
> for their advice.
> 1) Is this something that HBase will be able to handle gracefully?
> 2) Does anyone have any pointers on how to tune HBase for performance
> and stability under this load?
> 3) Would HBase perform better under this sort of load on twelve large
> EC2 instances, six xlarge or three xxlarge?