Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase Thrift inserts bottlenecked somewhere -- but where?


+
Dan Crosta 2013-03-01, 12:17
+
Asaf Mesika 2013-03-01, 14:13
+
Dan Crosta 2013-03-01, 14:17
+
Jean-Daniel Cryans 2013-03-01, 17:33
+
Varun Sharma 2013-03-01, 18:46
+
Varun Sharma 2013-03-01, 18:46
+
Dan Crosta 2013-03-01, 18:49
+
Ted Yu 2013-03-01, 18:52
+
Varun Sharma 2013-03-01, 19:01
+
Ted Yu 2013-03-02, 03:53
+
Dan Crosta 2013-03-02, 17:15
Copy link to this message
-
Re: HBase Thrift inserts bottlenecked somewhere -- but where?
lars hofhansl 2013-03-02, 03:42
What performance profile do you expect?
Where does it top out (i.e. how many ops/sec)?

Also note that each data item is replicated to three nodes (by HDFS). So in a 6 machine cluster each machine would get 50% of the writes.
If you are looking for performance you really need a larger cluster to amortize this replication cost across more machines.

The other issue to watch out for is whether your keys are generated such that a single regionserver is hot spotted (you can look at the operation count on the master page).

-- Lars

________________________________
 From: Dan Crosta <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Friday, March 1, 2013 4:17 AM
Subject: HBase Thrift inserts bottlenecked somewhere -- but where?
 
We are using a 6-node HBase cluster with a Thrift Server on each of the RegionServer nodes, and trying to evaluate maximum write throughput for our use case (which involves many processes sending mutateRowsTs commands). Somewhere between about 30 and 40 processes writing into the system we cross the threshold where adding additional writers yields only very limited returns to throughput, and I'm not sure why. We see that the CPU and Disk on the DataNode/RegionServer/ThriftServer machines are not saturated, nor is the NIC in those machines. I'm a little unsure where to look next.

A little more detail about our deployment:

* CDH 4.1.2
* DataNode/RegionServer/ThriftServer class: EC2 m1.xlarge
** RegionServer: 8GB heap
** ThriftServer: 1GB heap
** DataNode: 4GB heap
** EC2 ephemeral (i.e. local, not EBS) volumes used for HDFS

If there's any other information that I can provide, or any other configuration or system settings I should look at, I'd appreciate the pointers.

Thanks,
- Dan
+
Dan Crosta 2013-03-02, 17:12
+
lars hofhansl 2013-03-02, 17:38
+
Dan Crosta 2013-03-02, 18:47
+
Asaf Mesika 2013-03-02, 19:56
+
Ted Yu 2013-03-02, 20:02
+
lars hofhansl 2013-03-02, 20:50
+
lars hofhansl 2013-03-02, 20:50
+
Dan Crosta 2013-03-02, 22:29
+
Varun Sharma 2013-03-03, 11:08
+
Dan Crosta 2013-03-03, 13:53
+
lars hofhansl 2013-03-02, 20:56
+
Andrew Purtell 2013-03-05, 07:04