dev to bcc.
Welcome to HBase! You'll have a larger pool of people to answer your
questions if you take them first to the user mailing list.
I'm a student in computer science and I'm trying to use Hbase above an HDFS
> to show some performance of the system. I'd like to know how works the
> regionservers cluster when i insert / retrieve data from the cluster.
For understanding the read and write paths, start by having a look at these
excellent blog posts on the topics:
I have one master, one zookeeper server and 4 regionservers. I create a
> table and this is created only on one regionserver, in one specifically
> When I insert data, I notice that only that regionservers works, so i think
> the load is not splitted into the entire cluster. Is it right?
That's correct. By default, creating a table results in the creation of a
single region for that table. A single region is hosted by a single
RegionServer, so all data operations for that table will go to that single
region and thus the single RegionServer.
Furthermore, when i retrieve data, again, only that regionserver still
Also correct. Single region means only one machine is making that data
available for reads and writes.
There is a way to split the data into different region on different
> regionservers? Because otherwise, i think there's no difference to have 1
> or 4 regionservers but to have replication performed by HDFS.
Yes, you can create a table to be pre-split -- by providing a list of split
points, HBase will create your table with some number of regions. You'll
see your operations against that table distributed across those regions.
The balancer will ensure those regions are distributed evenly across all
the RegionServers in your cluster. You can also trigger table splits from
the hbase shell, try `help 'split'` for info. You probably can find useful
information on these topics in our online book: