Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Replication not suited for intensive write applications?


Copy link to this message
-
Re: Replication not suited for intensive write applications?
Asaf Mesika 2013-06-23, 06:33
bq. I'm not sure if it's really a problem tho.

Let's the maximum throughput achieved by writing with k client threads is
30 MB/sec, where k = the number of region servers.
If you are consistently writing to HBase more than 30 MB/sec  - lets say 40
MB/sec with 2k threads - that you can't use HBase replication and must
write your own solution.

One way I started thinking about is to somehow declare that for a specific
table, order of Puts is not important (say each write is unique), thus you
can spawn multiple threads for replicating a WAL file.
On Sat, Jun 22, 2013 at 12:18 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> I think that the same way writing with more clients helped throughput,
> writing with only 1 replication thread will hurt it. The clients in
> both cases have to read something (a file from HDFS or the WAL) then
> ship it, meaning that you can utilize the cluster better since a
> single client isn't consistently writing.
>
> I agree with Asaf's assessment that it's possible that you can write
> faster into HBase than you can replicate from it if your clients are
> using the write buffers and have a bigger aggregate throughput than
> replication's.
>
> I'm not sure if it's really a problem tho.
>
> J-D
>
> On Fri, Jun 21, 2013 at 3:05 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > Hmm... Yes. Was worth a try :)  Should've checked and I even wrote that
> part of the code.
> >
> > I have no good explanation then, and also no good suggestion about how
> to improve this.
> >
> >
> >
> > ________________________________
> >  From: Asaf Mesika <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> > Sent: Friday, June 21, 2013 5:50 AM
> > Subject: Re: Replication not suited for intensive write applications?
> >
> >
> > On Fri, Jun 21, 2013 at 2:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> >
> >> Another thought...
> >>
> >> I assume you only write to a single table, right? How large are your
> rows
> >> on average?
> >>
> >> I'm writing to 2 tables: Avg row size for 1st table is 1500 bytes, and
> the
> > seconds around is around 800 bytes
> >
> >>
> >> Replication will send 64mb blocks by default (or 25000 edits, whatever
> is
> >> smaller). The default HTable buffer is 2mb only, so the slave RS
> receiving
> >> a block of edits (assuming it is a full block), has to do 32 rounds of
> >> splitting the edits per region in order to apply them.
> >>
> >> In the ReplicationSink.java (0.94.6) I see that HTable.batch() is used,
> > which writes directly to RS without buffers?
> >
> >   private void batch(byte[] tableName, List<Row> rows) throws
> IOException {
> >
> >     if (rows.isEmpty()) {
> >
> >       return;
> >
> >     }
> >
> >     HTableInterface table = null;
> >
> >     try {
> >
> >       table = new HTable(tableName, this.sharedHtableCon, this.
> > sharedThreadPool);
> >
> >       table.batch(rows);
> >
> >       this.metrics.appliedOpsRate.inc(rows.size());
> >
> >     } catch (InterruptedException ix) {
> >
> >       throw new IOException(ix);
> >
> >     } finally {
> >
> >       if (table != null) {
> >
> >         table.close();
> >
> >       }
> >
> >     }
> >
> >   }
> >
> >
> >
> >>
> >> There is no setting specifically targeted at the buffer size for
> >> replication, but maybe you could increase "hbase.client.write.buffer" to
> >> 64mb (67108864) on the slave cluster and see whether that makes a
> >> difference. If it does we can (1) add a setting to control the
> >> ReplicationSink HTable's buffer size, or (2) just have it match the
> >> replication buffer size "replication.source.size.capacity".
> >>
> >>
> >> -- Lars
> >> ________________________________
> >> From: lars hofhansl <[EMAIL PROTECTED]>
> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> >> Sent: Friday, June 21, 2013 1:48 AM
> >> Subject: Re: Replication not suited for intensive write applications?
> >>
> >>
> >> Thanks for checking... Interesting. So talking to 3RSs as opposed to