Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Read thruput


Copy link to this message
-
Re: Read thruput
What is the general read-thru put that one gets when using Hbase.

 I am not to able to achieve more than 3000/secs with a timeout of 50
millisecs.
In this case also there is 10% of them are timing-out.

-Vibhav
On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:

> yes, I have changes the BLOCK CACHE % to 0.35.
>
> -Vibhav
>
>
> On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> I was aware of that discussion which was about MAX_FILESIZE and BLOCKSIZE
>>
>> My suggestion was about block cache percentage.
>>
>> Cheers
>>
>>
>> On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
>>
>> > I have used the following site:
>> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
>> >
>> > to lessen the value of block cache.
>> >
>> > -Vibhav
>> >
>> >
>> > On Mon, Apr 1, 2013 at 4:23 PM, Ted <[EMAIL PROTECTED]> wrote:
>> >
>> > > Can you increase block cache size ?
>> > >
>> > > What version of hbase are you using ?
>> > >
>> > > Thanks
>> > >
>> > > On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > The typical size of each of my row is less than 1KB.
>> > > >
>> > > > Regarding the memory, I have used 8GB for Hbase regionservers and 4
>> GB
>> > > for
>> > > > datanodes and I dont see them completely used. So I ruled out the GC
>> > > aspect.
>> > > >
>> > > > In case u still believe that GC is an issue, I will upload the gc
>> logs.
>> > > >
>> > > > -Vibhav
>> > > >
>> > > >
>> > > > On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan <
>> > > > [EMAIL PROTECTED]> wrote:
>> > > >
>> > > >> Hi
>> > > >>
>> > > >> How big is your row?  Are they wider rows and what would be the
>> size
>> > of
>> > > >> every cell?
>> > > >> How many read threads are getting used?
>> > > >>
>> > > >>
>> > > >> Were you able to take a thread dump when this was happening?  Have
>> you
>> > > seen
>> > > >> the GC log?
>> > > >> May be need some more info before we can think of the problem.
>> > > >>
>> > > >> Regards
>> > > >> Ram
>> > > >>
>> > > >>
>> > > >> On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <[EMAIL PROTECTED]>
>> > wrote:
>> > > >>
>> > > >>> Hi All,
>> > > >>>
>> > > >>> I am trying to use Hbase for real-time data retrieval with a
>> timeout
>> > of
>> > > >> 50
>> > > >>> ms.
>> > > >>>
>> > > >>> I am using 2 machines as datanode and regionservers,
>> > > >>> and one machine as a master for hadoop and Hbase.
>> > > >>>
>> > > >>> But I am able to fire only 3000 queries per sec and 10% of them
>> are
>> > > >> timing
>> > > >>> out.
>> > > >>> The database has 60 million rows.
>> > > >>>
>> > > >>> Are these figure okie, or I am missing something.
>> > > >>> I have used the scanner caching to be equal to one, because for
>> each
>> > > time
>> > > >>> we are fetching a single row only.
>> > > >>>
>> > > >>> Here are the various configurations:
>> > > >>>
>> > > >>> *Our schema
>> > > >>> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf',
>> DATA_BLOCK_ENCODING
>> > =>
>> > > >>> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0',
>> > COMPRESSION
>> > > =>
>> > > >>> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0',
>> KEE
>> > > >>> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK =>
>> > > >> 'true',
>> > > >>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>> > > >>>
>> > > >>> *Configuration*
>> > > >>> 1 Machine having both hbase and hadoop master
>> > > >>> 2 machines having both region server node and datanode
>> > > >>> total 285 region servers
>> > > >>>
>> > > >>> *Machine Level Optimizations:*
>> > > >>> a)No of file descriptors is 1000000(ulimit -n gives 1000000)
>> > > >>> b)Increase the read-ahead value to 4096
>> > > >>> c)Added noatime,nodiratime to the disks
>> > > >>>
>> > > >>> *Hadoop Optimizations:*
>> > > >>> dfs.datanode.max.xcievers = 4096
>> > > >>> dfs.block.size = 33554432
>> > > >>> dfs.datanode.handler.count = 256
>> > > >>> io.file.buffer.size = 65536
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB