Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Cyclic Replication Issue: some data are missing in the replication for intensive write


+
Jerry Lam 2012-04-20, 12:38
+
Himanshu Vashishtha 2012-04-20, 14:23
+
Jerry Lam 2012-04-20, 22:43
+
lars hofhansl 2012-04-20, 23:08
+
Jerry Lam 2012-04-23, 12:37
+
Jean-Daniel Cryans 2012-04-23, 17:57
Copy link to this message
-
Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hello Jerry,

Did you try this again.

Whenever you try next, can you please share the logs somehow.

I tried replicating your scenario today, but no luck. I used the same
workload you have copied here; master cluster has 5 nodes and slave
has just 2 nodes; and made tiny regions of 8MB (memstore flushing at
8mb too), so that I have around 1200+ regions even for 200k rows; ran
the workload with 16, 24 and 32 client threads, but the verifyrep
mapreduce job says its good.
Yes, I ran the verifyrep command after seeing "there is nothing to
replicate" message on all the regionservers; sometimes it was a bit
slow.
Thanks,
Himanshu

On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans
<[EMAIL PROTECTED]> wrote:
>> I will try your suggestion today with a master-slave replication enabled from Cluster A -> Cluster B.
>
> Please do.
>
>> Last Friday, I tried to limit the variability/the moving part of the replication components. I reduced the size of Cluster B to have only 1 regionserver and having Cluster A to replicate data from one region only without region splitting (therefore I have 1-to-1 region replication setup). During the benchmark, I moved the region between different regionservers in Cluster A (note there are still 3 regionservers in Cluster A). I ran this test for 5 times and no data were lost. Does it mean something? My feeling is there are some glitches/corner cases that have not been covered in the cyclic replication (or hbase replication in general). Note that, this happens only when the load is high.
>
> And have you looked at the logs? Any obvious exceptions coming up?
> Replication uses the normal HBase client to insert the data on the
> other cluster and this is what handles regions moving around.
>
>>
>> By the way, why do we need to have a zookeeper not handled by hbase for the replication to work (it is described in the hbase documentation)?
>
> It says you *should* do it, not you *need* to do it :)
>
> But basically replication is zk-heavy and getting a better
> understanding of it starts with handling it yourself.
>
> J-D
+
Jerry Lam 2012-05-02, 02:01
+
Himanshu Vashishtha 2012-05-02, 02:08
+
Jerry Lam 2012-05-02, 02:32
+
lars hofhansl 2012-04-20, 19:34
+
Jerry Lam 2012-04-20, 22:47
+
Ted Yu 2012-04-20, 22:43