Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Random Read latency > 100ms


Copy link to this message
-
Re: HBase Random Read latency > 100ms
Adding to what Lars said, you can enable bloom filters on column families
for read performance.
On Mon, Oct 7, 2013 at 10:51 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Have you enabled short circuit reading? See here:
> http://hbase.apache.org/book/perf.hdfs.html
>
> How's your data locality (shown on the RegionServer UI page).
>
>
> How much memory are you giving your RegionServers?
> If you reads are truly random and the data set does not fit into the
> aggregate cache, you'll be dominated by the disk and network.
> Each read would need to bring in a 64k (default) HFile block. If short
> circuit reading is not enabled you'll get two or three context switches.
>
> So I would try:
> 1. Enable short circuit reading
> 2. Increase the block cache size per RegionServer
> 3. Decrease the HFile block size
> 4. Make sure your data is local (if it is not, issue a major compaction).
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Ramu M S <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Sunday, October 6, 2013 10:01 PM
> Subject: HBase Random Read latency > 100ms
>
>
> Hi All,
>
> My HBase cluster has 8 Region Servers (CDH 4.4.0, HBase 0.94.6).
>
> Each Region Server is with the following configuration,
> 16 Core CPU, 192 GB RAM, 800 GB SATA (7200 RPM) Disk
> (Unfortunately configured with RAID 1, can't change this as the Machines
> are leased temporarily for a month).
>
> I am running YCSB benchmark tests on HBase and currently inserting around
> 1.8 Billion records.
> (1 Key + 7 Fields of 100 Bytes = 724 Bytes per record)
>
> Currently I am getting a write throughput of around 100K OPS, but random
> reads are very very slow, all gets have more than 100ms or more latency.
>
> I have changed the following default configuration,
> 1. HFile Size: 16GB
> 2. HDFS Block Size: 512 MB
>
> Total Data size is around 1.8 TB (Excluding the replicas).
> My Table is split into 128 Regions (No pre-splitting used, started with 1
> and grew to 128 over the insertion time)
>
> Taking some inputs from earlier discussions I have done the following
> changes to disable Nagle (In both Client and Server hbase-site.xml,
> hdfs-site.xml)
>
> <property>
>   <name>hbase.ipc.client.tcpnodelay</name>
>   <value>true</value>
> </property>
>
> <property>
>   <name>ipc.server.tcpnodelay</name>
>   <value>true</value>
> </property>
>
> Ganglia stats shows large CPU IO wait (>30% during reads).
>
> I agree that disk configuration is not ideal for Hadoop cluster, but as
> told earlier it can't change for now.
> I feel the latency is way beyond any reported results so far.
>
> Any pointers on what can be wrong?
>
> Thanks,
> Ramu
>

--
Bharath Vissapragada
<http://www.cloudera.com>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB