|
|
+
Asaf Mesika 2013-01-11, 13:03
+
Bryan Keller 2013-01-11, 17:37
+
Bryan Keller 2013-01-15, 17:28
+
Andrew Purtell 2013-01-15, 17:48
-
Re: Maximizing throughputanil gupta 2013-01-15, 20:04
Hi Bryan,
Nice that you figured out the bottleneck. However, with your current Hardware configuration, Disk I/O might become your bottleneck in future since you have only 6 disk and 12 core. Try to bring the Cores to Disk(No. of Cores/No. of Disk) ratio closer to 1 for a greater throughput and better utilization of hardware resources. I have a 7 node (2 admin and 5 worker)HBase cluster with each node having 12 cores, 11 disk, 48 GB ram . After tuning the GC, and other parameters i have easily achieved write load of 2200 request per second per node(consistently for 5 hours load test loading 200 million rows.). Yet, i need to test the upper limit of write load. I haven't done read load test yet. I have used PerformanceEvaluation utility of HBase for this. HTH, Anil Gupta On Fri, Jan 11, 2013 at 9:37 AM, Bryan Keller <[EMAIL PROTECTED]> wrote: > Thanks for the responses. I'm running HBase 0.92.1 (Cloudera CDH4). > > The program is very simple, it inserts batches of rows into a table via > multiple threads. I've tried running it with different parameters (column > count, threads, batch size, etc.), but throughput didn't improve. I've > pasted the code here: http://pastebin.com/gPXfdkPy > > I have auto flush on (default) as I am inserting rows in batch so don't > need to use the internal HTable write buffer. > > I've posted my config as well: http://pastebin.com/LVG9h6Z4 > > The regionservers have 12 cores (24 with HT), 128 GB RAM, 6 SCSI drives > Max throughput is 90-100mb/sec on a drive. I've also tested this on an EC2 > High I/O instance type with 2 SSDs, 64GB RAM, and 8 cores (16 with HT). > Both the EC2 and my colo cluster have the same issue of seemingly > underutilizing resources. > > I measure disk usage using iostat and measured the theoretical max using > hdparm dd. I use iftop to monitor network bandwidth usage, and used iperf > to test theoretical max. CPU usage I use top and iostat. > > The maximum write performance I'm getting is usually around 20mb/sec on a > drive (this is my colo cluster) on each of the 2 data nodes. That's about > 20% of the max, and it is only sporadic, not a consistent 20mb/sec per > drive. Network usage also seems to top out around 20% (200mbit/sec) to each > node. CPU usage on each node is around 10%. The problem is more pronounced > on EC2 which has much higher theoretical limits for storage and network I/O. > > Copying a 133gb file to HDFS looks like it gives similar performance as > HBase (sporadic disk usage topping out at 20%, low CPU, 30-40% network I/O) > so it seems this is more of an HDFS issue than an HBase issue. > > -- Thanks & Regards, Anil Gupta |