> Would it make sense to give remote blocks higher priority over the local
blocks that can be read via SCR and not let them get evicted if there is a
tie in which block to evict ?
That sounds like a reasonable idea. As are the others.
But first, could this be a bug?
What version of HBase? Were you able to take and save stack traces from the
offending RegionServers while they were engaged in this behavior? (If not,
maybe next time?) What were the HBase clients doing at the time that might
correlate? Are there a set of steps that can be distilled that tend
to trigger what you are seeing?
On Monday, July 8, 2013, Viral Bajaria wrote:
> Trying to make a case for making the block eviction strategy smart and to
> not evict remote blocks more frequently and make the requests more smarter.
> The question here comes after I debugged the issue that I was having with
> random region servers hitting high load averages. I initially thought the
> problem was hardware related i.e. bad disk or network since the wait I/O
> was too high but it was a combination of things.
> I figured with SCR (short circuit read) ON the datanode should almost never
> show high amount of block requests from the local regionservers. So my
> starting point for debugging was the datanode since it was doing a ton of
> I/O. The clienttrace logs helped me figure out which RS nodes were making
> block requests. I hacked up a script to report which blocks are being
> requested and how many times per minute. I found that some blocks were
> being requested 10+ times in a minute and over 2000 times in an hour from
> the same regionserver. This was causing the server to do 40+MB/s on reads
> alone. That was on the higher side, the average was closer to 100 or so per
> Now why did I end up in such a situation. It happened due to the fact that
> I added servers to the cluster and rebalanced the cluster. At the same time
> I added some drives and also removed the offending server in my setup. This
> caused some of the data to not be co-located with the regionservers. Given
> that major_compaction was disabled and it would not have run for a while
> (atleast on some tables) these block requests would not go away. One of my
> regionservers was totally overwhelmed. I made the situation worse when I
> removed the server that was under heavy load with the assumption that it's
> a hardware problem with the box without doing a deep dive (doh!). Given
> that regionservers will be added in the future, I expect block locality to
> go down till major_compaction runs. Also nodes can go down and cause this
> problem. So I started thinking of probable solutions, but first some
> - The surprising part was the regionservers were trying to make so many
> requests for the same block in the same minute (let alone hour). Could this
> happen because the original request took a few seconds and so the
> regionserver re-requested ? I didn't see any fetch errors in the
> regionserver logs for blocks.
> - Even more strange; my heap size was at 11G and the time when this was
> happening, the used heap was at 2-4G. I would have expected the heap to
> grow higher than that since the blockCache should be using atleast 40% of
> the available heap space.
> - Another strange thing that I observed was, the block was being requested
> from the same datanode every single time.
> *Possible Solution/Changes*
> - Would it make sense to give remote blocks higher priority over the local
> blocks that can be read via SCR and not let them get evicted if there is a
> tie in which block to evict ?
> - Should we throttle the number of outgoing requests for a block ? I am not
> sure if my firewall caused some issue but I wouldn't expect multiple block
> fetch requests in the same minute. I did see a few RST packets getting
> dropped at the firewall but I wasn't able to trace the problem was due to
> - We have 3 replicas available, shouldn't we request from the other
Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)