Getting the maximum performance out of HBase isn't just about tuning
the cluster. There are several other factors to take into account. The
two most important being:
1. Most important factor being the schema design
2. How you are using the APIs
Starting with the default configs is okay. Are you getting performance
or stability issues? If yes, start by knocking those out.
PS: I have covered several tuning concepts in HBase In Action and
there is plenty information available in the online HBase manual and
Lars' book as well. Refer to those if you want to understand more
general concepts that are at play.
On Oct 5, 2012, at 7:16 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> I have a timeseries data and each row has upto 1000 cols. I just started
> with defaults and I have not tuned any parameters on client or server. My
> reads are reading all the cols in a row. But request for a given row is
> completely random.
> On Fri, Oct 5, 2012 at 6:05 PM, Kevin O'dell <[EMAIL PROTECTED]>wrote:
>> Michael is right most parameters usually go one way or the other depending
>> on what you are trying to accomplish.
>> Memstore - raise for high write
>> Blockcache - raise for high reads
>> hbase blocksize - higher for sequential workload lower for random
>> client caching - lower for really wide rows/large cells and higher for tall
>> tables/small cells
>> On Fri, Oct 5, 2012 at 8:54 PM, Michael Segel <[EMAIL PROTECTED]
>>> What sort of system are you tuning?
>>> Sorry, but we have to start somewhere and if we don't know what you have
>>> in terms of hardware, we don't have a good starting point.
>>> On Oct 5, 2012, at 7:47 PM, Mohit Anchlia <[EMAIL PROTECTED]>
>>>> Do most people start out with default values and then tune HBase? Or
>>>> there some important configuration parameter that should always be
>>>> on client and the server?
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera