Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Thrift inserts bottlenecked somewhere -- but where?

Copy link to this message
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
We are generating the load from multiple machines, yes.

Do you happen to know what the name of the setting for the number of ThriftServer threads is called? I can't find anything that is obviously about that in the CDH manager.

- Dan
On Mar 1, 2013, at 1:46 PM, Varun Sharma wrote:

> Did you try running 30-40 proc(s) on one machine and another 30-40 proc(s)
> on another machine to see if that doubles the throughput ?
> On Fri, Mar 1, 2013 at 10:46 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>> Hi,
>> I don't know how many worker threads you have at the thrift servers. Each
>> thread gets dedicated to a single connection and only serves that
>> connection. New connections get queued. Also, are you sure that you are not
>> saturating the client side making the calls ?
>> Varun
>> On Fri, Mar 1, 2013 at 9:33 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>>> The primary unit of load distribution in HBase is the region, make
>>> sure you have more than one. This is well documented in the manual
>>> http://hbase.apache.org/book/perf.writing.html
>>> J-D
>>> On Fri, Mar 1, 2013 at 4:17 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:
>>>> We are using a 6-node HBase cluster with a Thrift Server on each of the
>>> RegionServer nodes, and trying to evaluate maximum write throughput for our
>>> use case (which involves many processes sending mutateRowsTs commands).
>>> Somewhere between about 30 and 40 processes writing into the system we
>>> cross the threshold where adding additional writers yields only very
>>> limited returns to throughput, and I'm not sure why. We see that the CPU
>>> and Disk on the DataNode/RegionServer/ThriftServer machines are not
>>> saturated, nor is the NIC in those machines. I'm a little unsure where to
>>> look next.
>>>> A little more detail about our deployment:
>>>> * CDH 4.1.2
>>>> * DataNode/RegionServer/ThriftServer class: EC2 m1.xlarge
>>>> ** RegionServer: 8GB heap
>>>> ** ThriftServer: 1GB heap
>>>> ** DataNode: 4GB heap
>>>> ** EC2 ephemeral (i.e. local, not EBS) volumes used for HDFS
>>>> If there's any other information that I can provide, or any other
>>> configuration or system settings I should look at, I'd appreciate the
>>> pointers.
>>>> Thanks,
>>>> - Dan