Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Thrift inserts bottlenecked somewhere -- but where?


Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
Here're the parameters you should look at:

hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/HThreadedSelectorServerArgs.java:
     "hbase.thrift.selector.threads";
hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/HThreadedSelectorServerArgs.java:
     "hbase.thrift.worker.threads";
hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java:
     "hbase.thrift.threadKeepAliveTimeSec";

Often times, source code is the best help :-)

On Fri, Mar 1, 2013 at 10:49 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:

> We are generating the load from multiple machines, yes.
>
> Do you happen to know what the name of the setting for the number of
> ThriftServer threads is called? I can't find anything that is obviously
> about that in the CDH manager.
>
> - Dan
>
>
> On Mar 1, 2013, at 1:46 PM, Varun Sharma wrote:
>
> > Did you try running 30-40 proc(s) on one machine and another 30-40
> proc(s)
> > on another machine to see if that doubles the throughput ?
> >
> > On Fri, Mar 1, 2013 at 10:46 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
> >
> >> Hi,
> >>
> >> I don't know how many worker threads you have at the thrift servers.
> Each
> >> thread gets dedicated to a single connection and only serves that
> >> connection. New connections get queued. Also, are you sure that you are
> not
> >> saturating the client side making the calls ?
> >>
> >> Varun
> >>
> >>
> >> On Fri, Mar 1, 2013 at 9:33 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >wrote:
> >>
> >>> The primary unit of load distribution in HBase is the region, make
> >>> sure you have more than one. This is well documented in the manual
> >>> http://hbase.apache.org/book/perf.writing.html
> >>>
> >>> J-D
> >>>
> >>> On Fri, Mar 1, 2013 at 4:17 AM, Dan Crosta <[EMAIL PROTECTED]> wrote:
> >>>> We are using a 6-node HBase cluster with a Thrift Server on each of
> the
> >>> RegionServer nodes, and trying to evaluate maximum write throughput
> for our
> >>> use case (which involves many processes sending mutateRowsTs commands).
> >>> Somewhere between about 30 and 40 processes writing into the system we
> >>> cross the threshold where adding additional writers yields only very
> >>> limited returns to throughput, and I'm not sure why. We see that the
> CPU
> >>> and Disk on the DataNode/RegionServer/ThriftServer machines are not
> >>> saturated, nor is the NIC in those machines. I'm a little unsure where
> to
> >>> look next.
> >>>>
> >>>> A little more detail about our deployment:
> >>>>
> >>>> * CDH 4.1.2
> >>>> * DataNode/RegionServer/ThriftServer class: EC2 m1.xlarge
> >>>> ** RegionServer: 8GB heap
> >>>> ** ThriftServer: 1GB heap
> >>>> ** DataNode: 4GB heap
> >>>> ** EC2 ephemeral (i.e. local, not EBS) volumes used for HDFS
> >>>>
> >>>> If there's any other information that I can provide, or any other
> >>> configuration or system settings I should look at, I'd appreciate the
> >>> pointers.
> >>>>
> >>>> Thanks,
> >>>> - Dan
> >>>
> >>
> >>
>
>