Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> HBase Replication use cases


Copy link to this message
-
Re: HBase Replication use cases


On Apr 12, 2012, at 2:50 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Thanks Himanshu,
>
> we're planning to use Replication for cross DC replication for DR (and we added a bunch of stuff and fixed bugs in replication).
>
>
> We'll have it always on (and only use stop/start_peer, which is new in 0.94+ to temporarily stop replication, rather than stop/start_replication)
> HBASE-2611 is a problem. We did not have time recently to work on this.
>
> i) and ii) can be worked around by forcing a log roll on all region servers after replication was enabled. Replication would be considered started after the logs were
> rolled... But that is quite annoying.
>

Should we consider adding this as part of the replication code proper? Is there a smarter way to go about it?

- Jesse
> Is iii) still a problem in 0.92+? I thought we fixed that together with a).
>
> -- Lars
>
> ________________________________
> From: Himanshu Vashishtha <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, April 12, 2012 12:11 PM
> Subject: HBase Replication use cases
>
> Hello All,
>
> I have been doing testing on the HBase replication (0.90.4, and 0.92 variants).
>
> Here are some of the findings:
>
> a) 0.90+ is not that great in handling out znode changes; in an
> ongoing replication, if I delete a peer and a region server goes to
> the znode to update the log status, the region server aborts itself
> when it sees a missing znode.
>
> Recoverable Zookeeper seems to have fix this in 0.92+?
>
> 0.92 has lot of new features (start/stop handle, master master, cyclic).
>
> But there are corner cases with the start/stop switches.
> i)  A log is en-queue when the replication state is set to true. When we
> start the cluster, it is true and the starting region server takes the
> new log into the queue. If I do a stop_replication, and there is a log
> roll, and then I do a start_replication, the current log will not be
> replicated, as it has missed the opportunity of being added to the queue.
>
> ii) If I _start_ a region server when the replication state is set to
> false, its log will not be added to the queue. Now, if I do a
> start_replication, its log will not be replicated.
>
> iii) Removing a peer doesn't result in master region server abort, but
> in case of zk is down and there is a log roll, it will abort. Not a
> serious one as zk is down so the cluster is not healthy anyway.
>
> I was looking for jiras (including 2611), and stumbled upon 2223. I
> don't think there is any thing like time based partition behavior (as
> mentioned in the jira description). Though. the patch has lot of other
> nice things which indeed are in existing code. Please correct me if I
> miss  anything.
>
> Having said that, I wonder about other folks out there use it.
> Their experience, common issues (minor + major) they come across.
> I did find a ppt by Jean Daniel at oscon mentioning about using it in
> SU production.
>
> I plan to file jiras for the above ones and will start digging in.
>
> Look forward for your responses.
>
> Thanks,
> Himanshu