Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Read thruput


Copy link to this message
-
Re: Read thruput
I have used the following site:
http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow

to lessen the value of block cache.

-Vibhav
On Mon, Apr 1, 2013 at 4:23 PM, Ted <[EMAIL PROTECTED]> wrote:

> Can you increase block cache size ?
>
> What version of hbase are you using ?
>
> Thanks
>
> On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
>
> > The typical size of each of my row is less than 1KB.
> >
> > Regarding the memory, I have used 8GB for Hbase regionservers and 4 GB
> for
> > datanodes and I dont see them completely used. So I ruled out the GC
> aspect.
> >
> > In case u still believe that GC is an issue, I will upload the gc logs.
> >
> > -Vibhav
> >
> >
> > On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Hi
> >>
> >> How big is your row?  Are they wider rows and what would be the size of
> >> every cell?
> >> How many read threads are getting used?
> >>
> >>
> >> Were you able to take a thread dump when this was happening?  Have you
> seen
> >> the GC log?
> >> May be need some more info before we can think of the problem.
> >>
> >> Regards
> >> Ram
> >>
> >>
> >> On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I am trying to use Hbase for real-time data retrieval with a timeout of
> >> 50
> >>> ms.
> >>>
> >>> I am using 2 machines as datanode and regionservers,
> >>> and one machine as a master for hadoop and Hbase.
> >>>
> >>> But I am able to fire only 3000 queries per sec and 10% of them are
> >> timing
> >>> out.
> >>> The database has 60 million rows.
> >>>
> >>> Are these figure okie, or I am missing something.
> >>> I have used the scanner caching to be equal to one, because for each
> time
> >>> we are fetching a single row only.
> >>>
> >>> Here are the various configurations:
> >>>
> >>> *Our schema
> >>> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING =>
> >>> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION
> =>
> >>> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE
> >>> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK =>
> >> 'true',
> >>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
> >>>
> >>> *Configuration*
> >>> 1 Machine having both hbase and hadoop master
> >>> 2 machines having both region server node and datanode
> >>> total 285 region servers
> >>>
> >>> *Machine Level Optimizations:*
> >>> a)No of file descriptors is 1000000(ulimit -n gives 1000000)
> >>> b)Increase the read-ahead value to 4096
> >>> c)Added noatime,nodiratime to the disks
> >>>
> >>> *Hadoop Optimizations:*
> >>> dfs.datanode.max.xcievers = 4096
> >>> dfs.block.size = 33554432
> >>> dfs.datanode.handler.count = 256
> >>> io.file.buffer.size = 65536
> >>> hadoop data is split on 4 directories, so that different disks are
> being
> >>> accessed
> >>>
> >>> *Hbase Optimizations*:
> >>>
> >>> hbase.client.scanner.caching=1  #We have specifcally added this, as we
> >>> return always one row.
> >>> hbase.regionserver.handler.count=3200
> >>> hfile.block.cache.size=0.35
> >>> hbase.hregion.memstore.mslab.enabled=true
> >>> hfile.min.blocksize.size=16384
> >>> hfile.min.blocksize.size=4
> >>> hbase.hstore.blockingStoreFiles=200
> >>> hbase.regionserver.optionallogflushinterval=60000
> >>> hbase.hregion.majorcompaction=0
> >>> hbase.hstore.compaction.max=100
> >>> hbase.hstore.compactionThreshold=100
> >>>
> >>> *Hbase-GC
> >>> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
> >>> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
> >>> *Hadoop-GC*
> >>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >>>
> >>> -Vibhav
> >>
>