In case you didn't get what was just said:
When a regionserver crashes, HBase needs to replay the contents of
outstanding WAL logs (Write-Ahead-Logs A.K.A HLogs) before it can bring the
regions the crashed server was hosting back on line again. Replaying
involves reading the WALs and then 'splitting' the edits by region so that
when the region is opened in a new location, it has a nice neat file of its
edits, only, to replay before it starts serving.
The two metrics you cite are time and size histograms for this WAL split
process. The size histogram is updated with the size of all the WAL logs
each time the splitting process is run. The time histogram is updated w/
how long the split process took. These metrics are sort of important. You
want the time to be little since it is time some of your data is offline...
smaller log splitting sizes will usually mean less time so it is to try and
keep the number of outstanding WALs low.
Our metrics are better now, rationalized (though the num_ops suffix here
seems 'off'). They could do with a bit of doc'ing. What is in the refguide
is a bit stale.
On Thu, Feb 6, 2014 at 5:31 PM, Ted Yu <[EMAIL PROTECTED]> wrote: