I have a small Hadoop and HBase cluster with 4 nodes all acting as datanodes
and regionservers, with replication set to 3. I am bulk loading data in
HBase using the importtsv program, writing heavily to one table that
initially had no data in it and only 1 region. I'll call this TableA.
In HBase, I already had a table (tableB) with about 400 regions. These
regions were evenly distributed across the four nodes I have.
Here is the behavior I am observing with my bulk import of data: Initially,
one regionserver was assigned regions for TabelA, so it got all the initial
requests. When the number of regions became unbalanced across all four
nodes, regions for tableB (my old table) are reassigned to the other
regionservers, rather than any regions from my newer table (tableA). This
means that my one node continues to be hit with all requests, which is
slowing down my import.
How does HBase decide which regions to reassign when balancing, or is it
relatively arbitrary? Is there anything I can do at this point to force
regions of my TableA to be assigned to other region servers?