Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Replication not suited for intensive write applications?

Copy link to this message
Re: Replication not suited for intensive write applications?
lars hofhansl 2013-06-21, 11:38
Another thought...

I assume you only write to a single table, right? How large are your rows on average?
Replication will send 64mb blocks by default (or 25000 edits, whatever is smaller). The default HTable buffer is 2mb only, so the slave RS receiving a block of edits (assuming it is a full block), has to do 32 rounds of splitting the edits per region in order to apply them.
There is no setting specifically targeted at the buffer size for replication, but maybe you could increase "hbase.client.write.buffer" to 64mb (67108864) on the slave cluster and see whether that makes a difference. If it does we can (1) add a setting to control the ReplicationSink HTable's buffer size, or (2) just have it match the replication buffer size "replication.source.size.capacity".
-- Lars
From: lars hofhansl <[EMAIL PROTECTED]>
Sent: Friday, June 21, 2013 1:48 AM
Subject: Re: Replication not suited for intensive write applications?
Thanks for checking... Interesting. So talking to 3RSs as opposed to only 1 before had no effect on the throughput?

Would be good to explore this a bit more.
Since our RPC is not streaming, latency will effect throughout. In this case there is latency while all edits are shipped to the RS in the slave cluster and then extra latency when applying the edits there (which are likely not local to that RS). A true streaming API should be better. If that is the case compression *could* help (but that is a big if).

The single thread shipping the edits to the slave should not be an issue as the edits are actually applied by the slave RS, which will use multiple threads to apply the edits in the local cluster.

Also my first reply - upon re-reading it - sounded a bit rough, that was not intended.

-- Lars
----- Original Message -----
From: Asaf Mesika <[EMAIL PROTECTED]>
Sent: Thursday, June 20, 2013 10:16 PM
Subject: Re: Replication not suited for intensive write applications?

Thanks for the taking the time to answer!
My answers are inline.

On Fri, Jun 21, 2013 at 1:47 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> I see.
> In HBase you have machines for both CPU (to serve requests) and storage
> (to hold the data).
> If you only grow your cluster for CPU and you keep all RegionServers 100%
> busy at all times, you are correct.
> Maybe you need to increase replication.source.size.capacity and/or
> replication.source.nb.capacity (although I doubt that this will help here).
> I was thinking of giving a shot, but theoretically it should not affect,
since I'm doing anything in parallel, right?
> Also a replication source will pick region server from the target at
> random (10% of them at default). That has two effects:
> 1. Each source will pick exactly one RS at the target: ceil (3*0.1)=1
> 2. With such a small cluster setup the likelihood is high that two or more
> RSs in the source will happen to pick the same RS at the target. Thus
> leading less throughput.
You are absolutely correct. In Graphite, in the beginning, I saw that only
one slave RS was getting all replicateLogEntries RPC calls. I search the
master RS logs and saw "Choose Peer" as follows:
Master RS 74: Choose peer 83
Master RS 75: Choose peer 83
Master RS 76: Choose peer 85
From some reason, they ALL talked to 83 (which seems like a bug to me).

I thought I nailed the bottleneck, so I've changed the factor from 0.1 to
1. It had the exact you described, and now all RS were getting the same
amount of replicateLogEntries RPC calls, BUT it didn't budge the
replication throughput. When I checked the network card usage I understood
that even when all 3 RS were talking to the same slave RS, network wasn't
the bottleneck.
> In fact your numbers might indicate that two of your source RSs might have
> picked the same target (you get 2/3 of your throughput via replication).
I'll try getting two clusters of 10 RS each and see if that helps. I
suspect it won't. My hunch is that: since we're replicating with no more
than 10 threads, than if I take my client and set it to 10 threads and
measure the throughput, this will the maximum replication throughput. Thus,
if my client will write with let's say 20 threads (or have two client with
10 threads each), than I'm bound to reach an ever increasing

measure throughput and got 18 MB/sec which is bigger than the replication
throughput of 11 MB/sec, thus I concluded hard drives couldn't be the
bottleneck here.

I was thinking of somehow tweaking HBase a bit for my use case: I always
send Puts with new row KV (never update or delete), thus I have no
importance for ordering, thus maybe enable with a flag the ability, on a
certain column family to open multiple threads at the Replication Source?

One more question - keeping the one thread in mind here, having compression
on the replicateLogEntries RPC call, shouldn't really help here right?
Since the entire RPC call time is mostly the time it takes to run the
HTable.batch call on the slave RS, right? If I enable compression somehow
(hack HBase code to test drive it), I will only speed up transfer time of
the batch to the slave RS, but still wait on the insertion of this batch
into the slave cluster.