Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Read thruput


Hi All,

I am trying to use Hbase for real-time data retrieval with a timeout of 50
ms.

I am using 2 machines as datanode and regionservers,
and one machine as a master for hadoop and Hbase.

But I am able to fire only 3000 queries per sec and 10% of them are timing
out.
The database has 60 million rows.

Are these figure okie, or I am missing something.
I have used the scanner caching to be equal to one, because for each time
we are fetching a single row only.

Here are the various configurations:

*Our schema
*{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING =>
'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION =>
'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE
P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK => 'true',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}

*Configuration*
1 Machine having both hbase and hadoop master
2 machines having both region server node and datanode
total 285 region servers

*Machine Level Optimizations:*
a)No of file descriptors is 1000000(ulimit -n gives 1000000)
b)Increase the read-ahead value to 4096
c)Added noatime,nodiratime to the disks

*Hadoop Optimizations:*
dfs.datanode.max.xcievers = 4096
dfs.block.size = 33554432
dfs.datanode.handler.count = 256
io.file.buffer.size = 65536
hadoop data is split on 4 directories, so that different disks are being
accessed

*Hbase Optimizations*:

hbase.client.scanner.caching=1  #We have specifcally added this, as we
return always one row.
hbase.regionserver.handler.count=3200
hfile.block.cache.size=0.35
hbase.hregion.memstore.mslab.enabled=true
hfile.min.blocksize.size=16384
hfile.min.blocksize.size=4
hbase.hstore.blockingStoreFiles=200
hbase.regionserver.optionallogflushinterval=60000
hbase.hregion.majorcompaction=0
hbase.hstore.compaction.max=100
hbase.hstore.compactionThreshold=100

*Hbase-GC
*-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
*Hadoop-GC*
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC

-Vibhav
+
Azuryy Yu 2013-04-01, 11:33
+
Ted Yu 2013-04-01, 18:33
+
Vibhav Mundra 2013-04-02, 06:26
+
lars hofhansl 2013-04-02, 04:30
+
ramkrishna vasudevan 2013-04-01, 10:16
+
Vibhav Mundra 2013-04-01, 10:47
+
Ted 2013-04-01, 10:53
+
Vibhav Mundra 2013-04-01, 11:57
+
Ted Yu 2013-04-01, 16:50
+
Vibhav Mundra 2013-04-01, 17:50
+
Vibhav Mundra 2013-04-01, 17:59
+
Asaf Mesika 2013-04-01, 20:12
+
Vibhav Mundra 2013-04-02, 06:36
+
Asaf Mesika 2013-04-04, 04:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB