In implementing HBASE-10569 (co-locating meta and master), I run into some issue with the connections.
The issue is that ClusterConnection is package private (on purpose). I have to create an adapter (see the patch here https://reviews.apache.org/r/19198/) so that I can override some of the logic. Because meta and master are on the same JVM, I'd like to bypass the network/RPC layer when master tries to scan the meta table, or assign the meta region, and when the same regionserver tries to send reports to the master.
I was wondering what we can do here. Is it a good solution to create an adapter? That's kind of similar to make the connection public, right?
On Fri, Mar 14, 2014 at 12:22 PM, Jimmy Xiang <[EMAIL PROTECTED]> wrote:
One thought I was having this morning about your fancy patch Jimmy is that making it so the master regionserver has the meta region only might not be the way to go. Rather than have a single 'special' meta region, we might want to distribute it around the cluster -- i.e. let it split (like the accumulo fellows do) -- so that when meta is offline, it is less of a body blow.
That would mean that though the meta was on the same server as the master, you'd access it as you would any other region.
So, I'm asking if we should be going the above route at all?
That means there will be many small meta regions. If we just have one instance of each region, that should help. But we are moving towards HA regions, right? On Fri, Mar 14, 2014 at 12:31 PM, Stack <[EMAIL PROTECTED]> wrote:
Taking advantage of region replicas will require the indirection and potential network hop. Could be a "short-circuit" local read optimization is possible, but I don't think it worth it for scanning meta.
On Friday, March 14, 2014, Stack <[EMAIL PROTECTED]> wrote:
I was in favor of co-locating, because we had the "meta is one region" for so long, our regions are big, and we did not spend much time on master redesign. However, in an ideal case, we should be going with the splittable meta design from BT, and shoot for regions being sized around hdfs block size (128 / 512M) and having millions of regions.The reason we currently get away with single meta region is that, our regions can be 10-20GB, so 100K regions would be enough to address 1-2 PB data. It seems clear that we do not want two state machines, one in master, and one in meta per region which can diverge and make AM the hell that it is today. One way to ease this is to move meta into master and ensure master in-memory == meta. The other way would be to make master stateless and meta the only authoritative source. I would vote for the latter.
Coming to the ClusterConnection, I thought that CoprocessorHConnection is kind of similar. It should be fine to have an in-process ClusterConnection implementation.
Enis On Fri, Mar 14, 2014 at 3:23 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext