Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Replication not suited for intensive write applications?


Copy link to this message
-
Re: Replication not suited for intensive write applications?
On Thu, Jun 20, 2013 at 11:10 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> On Thu, Jun 20, 2013 at 7:12 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
> > What is the ageOfLastShippedOp as reported on your Master region servers
> > (should be available through the /jmx) - it tells the delay your edits
> are
> > experiencing before being shipped. If this number is < 1000 (in
> > milliseconds), I would say replication is doing a very good job. This is
> > the most important metric worth tracking and I would be interested in how
> > it looks since we are also looking into using replication for write heavy
> > workloads...
> >
> > ageOfLastShippedOp showed 10min, on 15GB on inserted data. When I ran the
> test with 50GB, it showed 30min. This was also easily spotted when in
> Graphite I see when the writeRequests count started increasing in the slave
> RS and when it stopped, thus can measure the duration of the replication.
>
> Although it is the single most important metric,  I had to fire up JConsole
> on the 3 Master RS since when using the hadoop-metrics.properties and
> configuring a context for Graphite (or even a file) I've discovered that if
> there is/was a recovered edits queue of another RS, it has reported its
> ageOfLastShippedOp forever instead of the active queue (since there's isn't
> a ageOfLastShippedOp metrics per queue).
>

In our tests run on 0.94.7 - we do see ageOfLastShippedOp per queue - so we
would see a giant number for the recovered queue and a small number for the
regular queue. Maybe you are running an old version which does not have
that.

>
>
> > The network on your 2nd cluster could be lower because replication ships
> > edits in batches - so the batching could be amortizing the amount of data
> > sent over the wire. Also, when you are measuring traffic - are you
> > measuring the traffic on the NIC - which will also include traffic due to
> > HDFS replication ?
> >
> > My NIC/ethernet measuring is quite simple. I ran "netstat -ie" which
> gives
> a total counter of bytes, both on Receive and Transmit for my interface
> (eth0). Running it before and after, gives you the total amount of bytes. I
> also know the duration of the replication work by watching the
> writeRequestsCount metric settle on the slave RS, thus I can calculate the
> throughput. 15 GB / 14min.
> Regarding your question - yes, it has to include all traffic on the card,
> which probably includes HDFS replication. There's much I can do about that
> though.
> We should note that the network capacity is not the issue, since it was
> measured 30MB/sec Receive and 20MB/sec Transmit, thus it's far from the
> measured max bandwidth of 111MB/sec (measured by running nc - netcat).
>
> Yep, saturating the NIC is not easy !

>
>
>
> >
> > On Thu, Jun 20, 2013 at 3:46 AM, Asaf Mesika <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >
> > > I've been conducting lots of benchmarks to test the maximum throughput
> of
> > > replication in HBase.
> > >
> > > I've come to the conclusion that HBase replication is not suited for
> > write
> > > intensive application. I hope that people here can show me where I'm
> > wrong.
> > >
> > > *My setup*
> > > *Cluster (*Master and slave are alike)
> > > 1 Master, NameNode
> > > 3 RS, Data Node
> > >
> > > All computers are the same: 8 Cores x 3.4 GHz, 8 GB Ram, 1 Gigabit
> > ethernet
> > > card
> > >
> > > I insert data into HBase from a java process (client) reading files
> from
> > > disk, running on the machine running the HBase Master in the master
> > > cluster.
> > >
> > > *Benchmark Results*
> > > When the client writes with 10 Threads, then the master cluster writes
> at
> > > 17 MB/sec, while the replicated cluster writes at 12 Mb/sec. The data
> > size
> > > I wrote is 15 GB, all Puts, to two different tables.
> > > Both clusters when tested independently without replication, achieved
> > write
> > > throughput of 17-19 MB/sec, so evidently the replication process is the
> > > bottleneck.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB