Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Replication not suited for intensive write applications?


Copy link to this message
-
Re: Replication not suited for intensive write applications?
First off, this is a pretty constructed case leading to a specious general conclusion.

If you only have three RSs/DNs and the default replication factor of 3, each machine will get every single write.
That is the first issue. Using HBase makes little sense with such a small cluster.

Secondly, as you say yourself, there are only three regionservers writing to the replicated cluster using a single thread each in order to preserve ordering.
With more region servers your scale will tip the other way. Again more regionservers will make this better.
As for your other question, more threads can lead to better interleaving of CPU and IO, thus leading to better throughput (this relationship is not linear, though).
-- Lars

----- Original Message -----
From: Asaf Mesika <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:
Sent: Thursday, June 20, 2013 3:46 AM
Subject: Replication not suited for intensive write applications?

Hi,

I've been conducting lots of benchmarks to test the maximum throughput of
replication in HBase.

I've come to the conclusion that HBase replication is not suited for write
intensive application. I hope that people here can show me where I'm wrong.

*My setup*
*Cluster (*Master and slave are alike)
1 Master, NameNode
3 RS, Data Node

All computers are the same: 8 Cores x 3.4 GHz, 8 GB Ram, 1 Gigabit ethernet
card

I insert data into HBase from a java process (client) reading files from
disk, running on the machine running the HBase Master in the master cluster.

*Benchmark Results*
When the client writes with 10 Threads, then the master cluster writes at
17 MB/sec, while the replicated cluster writes at 12 Mb/sec. The data size
I wrote is 15 GB, all Puts, to two different tables.
Both clusters when tested independently without replication, achieved write
throughput of 17-19 MB/sec, so evidently the replication process is the
bottleneck.

I also tested connectivity between the two clusters using "netcat" and
achieved 111 MB/sec.
I've checked the usage of the network cards both on the client, master
cluster region server and slave region servers. No computer when over
30mb/sec in Receive or Transmit.
The way I checked was rather crud but works: I've run "netstat -ie" before
HBase in the master cluster starts writing and after it finishes. The same
was done on the replicated cluster (when the replication started and
finished). I can tell the amount of bytes Received and Transmitted and I
know that duration each cluster worked, thus I can calculate the throughput.

*The bottleneck in my opinion*
Since we've excluded network capacity, and each cluster works at faster
rate independently, all is left is the replication process.
My client writes to the master cluster with 10 Threads, and manages to
write at 17-18 MB/sec.
Each region server has only 1 thread responsible for transmitting the data
written to the WAL to the slave cluster. Thus in my setup I effectively
have 3 threads writing to the slave cluster.  Thus this is the bottleneck,
since this process can not be parallelized, since it must transmit the WAL
in a certain order.

*Conclusion*
When writes intensively to HBase with more than 3 threads (in my setup),
you can't use replication.

*Master throughput without replication*
On a different note, I have one thing I couldn't understand at all.
When turned off replication, and wrote with my client with 3 threads I got
throughput of 11.3 MB/sec. When I wrote with 10 Threads (any more than that
doesn't help) I achieved maximum throughput of 19 MB/sec.
The network cards showed 30MB/sec Receive and 20MB/sec Transmit on each RS,
thus the network capacity was not the bottleneck.
On the HBase master machine which ran the client, the network card again
showed Receive throughput of 0.5MB/sec and Transmit throughput of 18.28
MB/sec. Hence it's the client machine network card creating the bottleneck.

The only explanation I have is the synchronized writes to the WAL. Those 10
threads have to get in line, and one by one, write their batch of Puts to
the WAL, which creates a bottleneck.

*My question*:
The one thing I couldn't understand is: When I write with 3 Threads,
meaning I have no more than 3 concurrent RPC requests to write in each RS.
They achieved 11.3 MB/sec.
The write to the WAL is synchronized, so why increasing the number of
threads to 10 (x3 more) actually increased the throughput to 19 MB/sec?
They all get in line to write to the same location, so it seems have
concurrent write shouldn't improve throughput at all.
Thanks you!

Asaf
*
*