Subject: RE: 答复: setMaxResultSize method in Scan
HBase RegionServer does scanning in batches, client requests next batch from server
and server reads and merge the data from cache/disk. You can control batch data size by setting both:
Scan.setRowCaching(number of rows to send in one RPC request)
Technically speaking, this allows you to control LIMIT from the client side. Your overhead will never be larger than the limit set by setRowCaching.
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [EMAIL PROTECTED]
From: Weiping Qu [[EMAIL PROTECTED]]
Sent: Monday, March 17, 2014 12:19 PM
To: [EMAIL PROTECTED]
Subject: Re: 答复: setMaxResultSize method in Scan
I am doing a mult-thread(100) scan test over hbase.
If one request with given key-range matches a large number of
correspoding rows in hbase, my request is waiting for this scan to complete.
The throughput is really slow.
For test purpose, I'd like to use LIMIT to reduce the time on scanning
and transferring results back from hbase to increase the throughput.
Do you think the "hbase.client.scan.max.result.size" or
setMaxResultSize(in bytes) could help HBase to stop scan at the LIMIT
before scanning complete corresponding rows?
As you mentioned that there is no query optimizer in HBase, I assume
that region servers will not stop scanning the rows in this key-range in
this case until it gets all the results and limit the results to max
size which is sent to the client.
If so, there is not much I can do to compare the throughput with that in
relational databases like MySQL.
Mit freundlichen Grü?en / Kind Regards
University of Kaiserslautern
Department of Computer Science
Heterogeneous Information Systems Group
P.O. Box 3049
67653 Kaiserslautern, Germany
Email: qu (at) informatik.uni-kl.de
Phone: +49 631 205 3264
Fax: +49 631 205 3299