Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Replication not suited for intensive write applications?


Copy link to this message
-
Re: Replication not suited for intensive write applications?
I think that the same way writing with more clients helped throughput,
writing with only 1 replication thread will hurt it. The clients in
both cases have to read something (a file from HDFS or the WAL) then
ship it, meaning that you can utilize the cluster better since a
single client isn't consistently writing.

I agree with Asaf's assessment that it's possible that you can write
faster into HBase than you can replicate from it if your clients are
using the write buffers and have a bigger aggregate throughput than
replication's.

I'm not sure if it's really a problem tho.

J-D

On Fri, Jun 21, 2013 at 3:05 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> Hmm... Yes. Was worth a try :)  Should've checked and I even wrote that part of the code.
>
> I have no good explanation then, and also no good suggestion about how to improve this.
>
>
>
> ________________________________
>  From: Asaf Mesika <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
> Sent: Friday, June 21, 2013 5:50 AM
> Subject: Re: Replication not suited for intensive write applications?
>
>
> On Fri, Jun 21, 2013 at 2:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Another thought...
>>
>> I assume you only write to a single table, right? How large are your rows
>> on average?
>>
>> I'm writing to 2 tables: Avg row size for 1st table is 1500 bytes, and the
> seconds around is around 800 bytes
>
>>
>> Replication will send 64mb blocks by default (or 25000 edits, whatever is
>> smaller). The default HTable buffer is 2mb only, so the slave RS receiving
>> a block of edits (assuming it is a full block), has to do 32 rounds of
>> splitting the edits per region in order to apply them.
>>
>> In the ReplicationSink.java (0.94.6) I see that HTable.batch() is used,
> which writes directly to RS without buffers?
>
>   private void batch(byte[] tableName, List<Row> rows) throws IOException {
>
>     if (rows.isEmpty()) {
>
>       return;
>
>     }
>
>     HTableInterface table = null;
>
>     try {
>
>       table = new HTable(tableName, this.sharedHtableCon, this.
> sharedThreadPool);
>
>       table.batch(rows);
>
>       this.metrics.appliedOpsRate.inc(rows.size());
>
>     } catch (InterruptedException ix) {
>
>       throw new IOException(ix);
>
>     } finally {
>
>       if (table != null) {
>
>         table.close();
>
>       }
>
>     }
>
>   }
>
>
>
>>
>> There is no setting specifically targeted at the buffer size for
>> replication, but maybe you could increase "hbase.client.write.buffer" to
>> 64mb (67108864) on the slave cluster and see whether that makes a
>> difference. If it does we can (1) add a setting to control the
>> ReplicationSink HTable's buffer size, or (2) just have it match the
>> replication buffer size "replication.source.size.capacity".
>>
>>
>> -- Lars
>> ________________________________
>> From: lars hofhansl <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Sent: Friday, June 21, 2013 1:48 AM
>> Subject: Re: Replication not suited for intensive write applications?
>>
>>
>> Thanks for checking... Interesting. So talking to 3RSs as opposed to only
>> 1 before had no effect on the throughput?
>>
>> Would be good to explore this a bit more.
>> Since our RPC is not streaming, latency will effect throughout. In this
>> case there is latency while all edits are shipped to the RS in the slave
>> cluster and then extra latency when applying the edits there (which are
>> likely not local to that RS). A true streaming API should be better. If
>> that is the case compression *could* help (but that is a big if).
>>
>> The single thread shipping the edits to the slave should not be an issue
>> as the edits are actually applied by the slave RS, which will use multiple
>> threads to apply the edits in the local cluster.
>>
>> Also my first reply - upon re-reading it - sounded a bit rough, that was
>> not intended.
>>
>> -- Lars
>>
>>
>> ----- Original Message -----