On Fri, Jan 10, 2014 at 7:55 PM, LEI Xiaofeng <[EMAIL PROTECTED]> wrote:
> Hi, Stack
> I have not test the performance in big cluster, yet. I just test it in two
> node using my c++ project. One node is holding HMaster and Zookeeper
> server, the other is holding HRegionServer. My scan performance is about
> 5293 records per second. When I get those data from a local file, the
> performance is 208757 records per second. PS. the data are stored as a tree
> in the file.
Sounds like you are comparing different things. Yes, going to the FS
direct versus going via the HBase API with its extra tiers handling its
'model' will be faster (see recent notes on this list on 'performance' for
some more explanation on why).
Would suggest you look at the reference guide to see how HBase does layout
in the FS and compare it to your file layout then look at the performance
section. Get the java client running fast. Then compare your c++ client
to the java result. If not fast enough, come back to the list with more
detail on your setup and your data formats.