Jonathan Hsieh 2013-12-03, 06:20
-Re: [Shadow Regions / Read Replicas ] Wal per region?
Devaraj Das 2013-12-03, 19:21
On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
> > Deveraj:
> > Jonathan Hsieh, WAL per region (WALpr) would give you the locality (and
> hence HDFS short
> > circuit) of reads if you were to couple it with the favored nodes. The
> cost is of course more WAL
> > files... In the current situation (no WALpr) it would create quite some
> traffic cross machine, no?
> I think we all agree that wal per region isn't efficient on today's
> spinning hard drive world where we are limited to a relatively low budget
> or seeks (though may be better in the future with SSD's).
Yes, agree on this. My thought is also in the direction that with SSDs the
issue of many files per RS would be hopefully not an issue.
> With this in mind, I actually I making the case that we would group the all
> the regions from RS-A onto the same set of preferred regions servers. This
> way we only need to have one or two other RS's tailing the RS.
> So for example, if region X, Y and Z were on RS-A and its hlog, the shadow
> region memstores for X, Y, and Z would be assigned to the same one or two
> other RSs. Ideally this would be where the HLog files replicas have
> locality (helped by favored nodes/block affinity). Doing this, we hold the
> number of readers on the active hlogs to a constant number, do not add any
> new cross machine traffic (though tailing currently has costs on the NN).
Yes, we did consider this but the issue is how much complex would the
failure handling be in order to maintain the grouping of the regions. So,
for example, if RS-A goes down, would the master be able to choose another
RS-A' quickly to maintain the grouping of the regions in RS-A. Or do we
then fallback to the regular single-region assignments and have the
balancer group the regions back... What's the grouping size. The same
issues apply to the assignments of the shadows.
But having said that, I agree that if the WALpr is expensive on non-SSD
hardware for example, we need to address the grouping of regions issues.
> One inefficiency we have is that if there is a single log per RS, we end up
> reading all the logs to tables that may not have the shadow feature
> enabled. However, with HBase multi-wals coming, one strategy is to shard
> wals to a number on the order of the number of disks on a machine (12-24
> these days). I think the a wal per namespaces (this could be used to have
> a wal per table) of the hlog would make sense. This way of shardind the
> hlog would reduce the amount of reading of irrelevant log entries on a log
> tailing scheme. It would have the added benefit of reducing the log
> splitting work reducing MTTR and allowing for recovery priorities if the
> primaries and shadows also go down. (this is an generalization of the
> separate out the META into a separate log idea).
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // [EMAIL PROTECTED]
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Jonathan Hsieh 2013-12-03, 21:58
Enis Söztutar 2013-12-03, 19:42
Jonathan Hsieh 2013-12-03, 22:03
Enis Söztutar 2013-12-03, 23:07
Jonathan Hsieh 2013-12-04, 01:59