Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Hbase scalability performance


+
Dalia Sobhy 2012-12-22, 15:43
+
Ted Yu 2012-12-22, 16:06
+
Michael Segel 2012-12-22, 16:23
+
Varun Sharma 2012-12-22, 16:50
Copy link to this message
-
Re: Hbase scalability performance
Mohit Anchlia 2012-12-22, 16:54
Also, check how balanced your region servers are accross all the nodes

On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Note that adding nodes will improve throughput and not latency. So, if your
> client application for benchmarking is single threaded, do not expect an
> improvement in number of reads per second by just adding nodes.
>
> On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > I thought it was Doug Miel who said that HBase doesn't start to shine
> > until you had at least 5 nodes.
> > (Apologies if I misspelled Doug's name.)
> >
> > I happen to concur and if you want to start testing scalability, you will
> > want to build a bigger test rig.
> >
> > Just saying!
> >
> >
> > Oh and you're going to have a hot spot on that row key.
> > Maybe do a hashed UUID ?
> >
> > I would suggest that you consider the following:
> >
> > Create N number of rows... where N is a very large number of rows.
> > Then to generate your random access, do a full table scan to get the N
> row
> > keys in to memory.
> > Using a random number generator,  generate a random number and pop that
> > row off the stack so that the next iteration is between 1 and (N-1).
> > Do this 200K times.
> >
> > Now time your 200K random fetches.
> >
> > It would be interesting to see how it performs  getting an average of a
> > 'couple' of runs... then increase the key space by an order of magnitude.
> > (Start w 1 million rows, 10 million rows, 100 million rows.... )
> >
> > In theory... if properly tuned. One should expect near linear results .
> >  That is to say the time it takes to get() a row across the data space
> > should be consistent. Although I wonder if you would have to somehow
> clear
> > the cache?
> >
> >
> > Sorry, just a random thought...
> >
> > -Mike
> >
> > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > By '3 datanodes', did you mean that you also increased the number of
> > region
> > > servers to 3 ?
> > >
> > > When your test was running, did you look at Web UI to see whether load
> > was
> > > balanced ? You can also use Ganglia for such purpose.
> > >
> > > What version of HBase are you using ?
> > >
> > > Thanks
> > >
> > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <
> [EMAIL PROTECTED]
> > >wrote:
> > >
> > >> Dear all,
> > >>
> > >> I am testing a simple hbase application on a cluster of multiple
> nodes.
> > >>
> > >> I am especially testing the scalability performance, by measuring the
> > time
> > >> taken for random reads
> > >>
> > >> Data size: 200,000 row
> > >> Row key : 0,1,2 very simple row key incremental
> > >>
> > >> But i don't know why by increasing the cluster size, I see the same
> > time.
> > >>
> > >> For ex:
> > >> 2 Datanodes: 1000 random read: 1.757 sec
> > >> 3 datanodes: 1000 random read: 1.7 sec
> > >>
> > >> So any help plzzz ??
> > >>
> > >>
> >
> >
>
+
Mohammad Tariq 2012-12-22, 17:39
+
Dalia Sobhy 2012-12-23, 13:42
+
Dalia Sobhy 2012-12-23, 13:38
+
Dimitry Goldin 2012-12-23, 13:57
+
Mohammad Tariq 2012-12-23, 22:05
+
Dalia Sobhy 2012-12-23, 13:44