Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Replication not suited for intensive write applications?


+
Asaf Mesika 2013-06-20, 10:46
+
Varun Sharma 2013-06-20, 16:12
+
Asaf Mesika 2013-06-20, 18:10
+
Varun Sharma 2013-06-20, 19:04
+
lars hofhansl 2013-06-20, 20:02
+
Asaf Mesika 2013-06-20, 20:38
+
lars hofhansl 2013-06-20, 22:47
+
Asaf Mesika 2013-06-21, 05:16
+
lars hofhansl 2013-06-21, 08:48
+
lars hofhansl 2013-06-21, 11:38
+
Asaf Mesika 2013-06-21, 12:50
Copy link to this message
-
Re: Replication not suited for intensive write applications?
Hmm... Yes. Was worth a try :)  Should've checked and I even wrote that part of the code.

I have no good explanation then, and also no good suggestion about how to improve this.

________________________________
 From: Asaf Mesika <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Friday, June 21, 2013 5:50 AM
Subject: Re: Replication not suited for intensive write applications?
 

On Fri, Jun 21, 2013 at 2:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Another thought...
>
> I assume you only write to a single table, right? How large are your rows
> on average?
>
> I'm writing to 2 tables: Avg row size for 1st table is 1500 bytes, and the
seconds around is around 800 bytes

>
> Replication will send 64mb blocks by default (or 25000 edits, whatever is
> smaller). The default HTable buffer is 2mb only, so the slave RS receiving
> a block of edits (assuming it is a full block), has to do 32 rounds of
> splitting the edits per region in order to apply them.
>
> In the ReplicationSink.java (0.94.6) I see that HTable.batch() is used,
which writes directly to RS without buffers?

  private void batch(byte[] tableName, List<Row> rows) throws IOException {

    if (rows.isEmpty()) {

      return;

    }

    HTableInterface table = null;

    try {

      table = new HTable(tableName, this.sharedHtableCon, this.
sharedThreadPool);

      table.batch(rows);

      this.metrics.appliedOpsRate.inc(rows.size());

    } catch (InterruptedException ix) {

      throw new IOException(ix);

    } finally {

      if (table != null) {

        table.close();

      }

    }

  }

>
> There is no setting specifically targeted at the buffer size for
> replication, but maybe you could increase "hbase.client.write.buffer" to
> 64mb (67108864) on the slave cluster and see whether that makes a
> difference. If it does we can (1) add a setting to control the
> ReplicationSink HTable's buffer size, or (2) just have it match the
> replication buffer size "replication.source.size.capacity".
>
>
> -- Lars
> ________________________________
> From: lars hofhansl <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Friday, June 21, 2013 1:48 AM
> Subject: Re: Replication not suited for intensive write applications?
>
>
> Thanks for checking... Interesting. So talking to 3RSs as opposed to only
> 1 before had no effect on the throughput?
>
> Would be good to explore this a bit more.
> Since our RPC is not streaming, latency will effect throughout. In this
> case there is latency while all edits are shipped to the RS in the slave
> cluster and then extra latency when applying the edits there (which are
> likely not local to that RS). A true streaming API should be better. If
> that is the case compression *could* help (but that is a big if).
>
> The single thread shipping the edits to the slave should not be an issue
> as the edits are actually applied by the slave RS, which will use multiple
> threads to apply the edits in the local cluster.
>
> Also my first reply - upon re-reading it - sounded a bit rough, that was
> not intended.
>
> -- Lars
>
>
> ----- Original Message -----
> From: Asaf Mesika <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> Cc:
> Sent: Thursday, June 20, 2013 10:16 PM
> Subject: Re: Replication not suited for intensive write applications?
>
> Thanks for the taking the time to answer!
> My answers are inline.
>
> On Fri, Jun 21, 2013 at 1:47 AM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > I see.
> >
> > In HBase you have machines for both CPU (to serve requests) and storage
> > (to hold the data).
> >
> > If you only grow your cluster for CPU and you keep all RegionServers 100%
> > busy at all times, you are correct.
> >
> > Maybe you need to increase replication.source.size.capacity and/or
> > replication.source.nb.capacity (although I doubt that this will help
> here).
+
Jean-Daniel Cryans 2013-06-21, 21:18
+
Asaf Mesika 2013-06-23, 06:33
+
Jean-Daniel Cryans 2013-06-24, 20:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB