Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Replication not suited for intensive write applications?


Copy link to this message
-
Re: Replication not suited for intensive write applications?
Given that the region server writes to a single WAL at a time, doing
it with multiple threads might be hard. You also have to manage the
correct position up in ZK. It might be easier with multiple WALs.

In any case, Inserting at such date might not be doable over long
periods of time. How long were your benchmarks running for exactly?
(can't find it in your first email)

You could also fancy doing regular bulk loads (say, every 30 minutes)
and consider shipping the same files to the other cluster.

Do you have a real use case in mind?

Thanks,

J-D

On Sat, Jun 22, 2013 at 11:33 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
> bq. I'm not sure if it's really a problem tho.
>
> Let's the maximum throughput achieved by writing with k client threads is
> 30 MB/sec, where k = the number of region servers.
> If you are consistently writing to HBase more than 30 MB/sec  - lets say 40
> MB/sec with 2k threads - that you can't use HBase replication and must
> write your own solution.
>
> One way I started thinking about is to somehow declare that for a specific
> table, order of Puts is not important (say each write is unique), thus you
> can spawn multiple threads for replicating a WAL file.
>
>
>
>
> On Sat, Jun 22, 2013 at 12:18 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
>
>> I think that the same way writing with more clients helped throughput,
>> writing with only 1 replication thread will hurt it. The clients in
>> both cases have to read something (a file from HDFS or the WAL) then
>> ship it, meaning that you can utilize the cluster better since a
>> single client isn't consistently writing.
>>
>> I agree with Asaf's assessment that it's possible that you can write
>> faster into HBase than you can replicate from it if your clients are
>> using the write buffers and have a bigger aggregate throughput than
>> replication's.
>>
>> I'm not sure if it's really a problem tho.
>>
>> J-D
>>
>> On Fri, Jun 21, 2013 at 3:05 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> > Hmm... Yes. Was worth a try :)  Should've checked and I even wrote that
>> part of the code.
>> >
>> > I have no good explanation then, and also no good suggestion about how
>> to improve this.
>> >
>> >
>> >
>> > ________________________________
>> >  From: Asaf Mesika <[EMAIL PROTECTED]>
>> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
>> [EMAIL PROTECTED]>
>> > Sent: Friday, June 21, 2013 5:50 AM
>> > Subject: Re: Replication not suited for intensive write applications?
>> >
>> >
>> > On Fri, Jun 21, 2013 at 2:38 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>> >
>> >> Another thought...
>> >>
>> >> I assume you only write to a single table, right? How large are your
>> rows
>> >> on average?
>> >>
>> >> I'm writing to 2 tables: Avg row size for 1st table is 1500 bytes, and
>> the
>> > seconds around is around 800 bytes
>> >
>> >>
>> >> Replication will send 64mb blocks by default (or 25000 edits, whatever
>> is
>> >> smaller). The default HTable buffer is 2mb only, so the slave RS
>> receiving
>> >> a block of edits (assuming it is a full block), has to do 32 rounds of
>> >> splitting the edits per region in order to apply them.
>> >>
>> >> In the ReplicationSink.java (0.94.6) I see that HTable.batch() is used,
>> > which writes directly to RS without buffers?
>> >
>> >   private void batch(byte[] tableName, List<Row> rows) throws
>> IOException {
>> >
>> >     if (rows.isEmpty()) {
>> >
>> >       return;
>> >
>> >     }
>> >
>> >     HTableInterface table = null;
>> >
>> >     try {
>> >
>> >       table = new HTable(tableName, this.sharedHtableCon, this.
>> > sharedThreadPool);
>> >
>> >       table.batch(rows);
>> >
>> >       this.metrics.appliedOpsRate.inc(rows.size());
>> >
>> >     } catch (InterruptedException ix) {
>> >
>> >       throw new IOException(ix);
>> >
>> >     } finally {
>> >
>> >       if (table != null) {
>> >
>> >         table.close();
>> >
>> >       }
>> >
>> >     }
>> >
>> >   }
>> >
>> >
>> >
>> >>