Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Read thruput


Copy link to this message
-
Re: Read thruput
Vibhav Mundra 2013-04-02, 06:26
Yes, we are running with 8GB, so I dont think that should be a concern.

-Vibhav
On Tue, Apr 2, 2013 at 12:03 AM, Ted Yu <[EMAIL PROTECTED]> wrote:

> Your hbase.regionserver.handler.count seems very high. The following is
> from hbase-default.xml:
>
>     For an estimate of server-side memory-used, evaluate
>     hbase.client.write.buffer * hbase.regionserver.handler.count
>
> In your case, the above product would be 6GB :-)
>
>
> On Mon, Apr 1, 2013 at 3:09 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
>
> > Hi All,
> >
> > I am trying to use Hbase for real-time data retrieval with a timeout of
> 50
> > ms.
> >
> > I am using 2 machines as datanode and regionservers,
> > and one machine as a master for hadoop and Hbase.
> >
> > But I am able to fire only 3000 queries per sec and 10% of them are
> timing
> > out.
> > The database has 60 million rows.
> >
> > Are these figure okie, or I am missing something.
> > I have used the scanner caching to be equal to one, because for each time
> > we are fetching a single row only.
> >
> > Here are the various configurations:
> >
> > *Our schema
> > *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING =>
> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION =>
> > 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE
> > P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK =>
> 'true',
> > IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >
> > *Configuration*
> > 1 Machine having both hbase and hadoop master
> > 2 machines having both region server node and datanode
> > total 285 region servers
> >
> > *Machine Level Optimizations:*
> > a)No of file descriptors is 1000000(ulimit -n gives 1000000)
> > b)Increase the read-ahead value to 4096
> > c)Added noatime,nodiratime to the disks
> >
> > *Hadoop Optimizations:*
> > dfs.datanode.max.xcievers = 4096
> > dfs.block.size = 33554432
> > dfs.datanode.handler.count = 256
> > io.file.buffer.size = 65536
> > hadoop data is split on 4 directories, so that different disks are being
> > accessed
> >
> > *Hbase Optimizations*:
> >
> > hbase.client.scanner.caching=1  #We have specifcally added this, as we
> > return always one row.
> > hbase.regionserver.handler.count=3200
> > hfile.block.cache.size=0.35
> > hbase.hregion.memstore.mslab.enabled=true
> > hfile.min.blocksize.size=16384
> > hfile.min.blocksize.size=4
> > hbase.hstore.blockingStoreFiles=200
> > hbase.regionserver.optionallogflushinterval=60000
> > hbase.hregion.majorcompaction=0
> > hbase.hstore.compaction.max=100
> > hbase.hstore.compactionThreshold=100
> >
> > *Hbase-GC
> > *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
> > -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
> > *Hadoop-GC*
> > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >
> > -Vibhav
> >
>