Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase scalability performance


Copy link to this message
-
Re: Hbase scalability performance
I totally agree with Michael. I was about to point out the same thing.
Probability of RS hotspotting is high when we have sequential keys. Even if
everything is balanced and your cluster is very well configured you might
end up with this issue.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Sat, Dec 22, 2012 at 10:24 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:

> Also, check how balanced your region servers are accross all the nodes
>
> On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:
>
> > Note that adding nodes will improve throughput and not latency. So, if
> your
> > client application for benchmarking is single threaded, do not expect an
> > improvement in number of reads per second by just adding nodes.
> >
> > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > I thought it was Doug Miel who said that HBase doesn't start to shine
> > > until you had at least 5 nodes.
> > > (Apologies if I misspelled Doug's name.)
> > >
> > > I happen to concur and if you want to start testing scalability, you
> will
> > > want to build a bigger test rig.
> > >
> > > Just saying!
> > >
> > >
> > > Oh and you're going to have a hot spot on that row key.
> > > Maybe do a hashed UUID ?
> > >
> > > I would suggest that you consider the following:
> > >
> > > Create N number of rows... where N is a very large number of rows.
> > > Then to generate your random access, do a full table scan to get the N
> > row
> > > keys in to memory.
> > > Using a random number generator,  generate a random number and pop that
> > > row off the stack so that the next iteration is between 1 and (N-1).
> > > Do this 200K times.
> > >
> > > Now time your 200K random fetches.
> > >
> > > It would be interesting to see how it performs  getting an average of a
> > > 'couple' of runs... then increase the key space by an order of
> magnitude.
> > > (Start w 1 million rows, 10 million rows, 100 million rows.... )
> > >
> > > In theory... if properly tuned. One should expect near linear results .
> > >  That is to say the time it takes to get() a row across the data space
> > > should be consistent. Although I wonder if you would have to somehow
> > clear
> > > the cache?
> > >
> > >
> > > Sorry, just a random thought...
> > >
> > > -Mike
> > >
> > > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > By '3 datanodes', did you mean that you also increased the number of
> > > region
> > > > servers to 3 ?
> > > >
> > > > When your test was running, did you look at Web UI to see whether
> load
> > > was
> > > > balanced ? You can also use Ganglia for such purpose.
> > > >
> > > > What version of HBase are you using ?
> > > >
> > > > Thanks
> > > >
> > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <
> > [EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> Dear all,
> > > >>
> > > >> I am testing a simple hbase application on a cluster of multiple
> > nodes.
> > > >>
> > > >> I am especially testing the scalability performance, by measuring
> the
> > > time
> > > >> taken for random reads
> > > >>
> > > >> Data size: 200,000 row
> > > >> Row key : 0,1,2 very simple row key incremental
> > > >>
> > > >> But i don't know why by increasing the cluster size, I see the same
> > > time.
> > > >>
> > > >> For ex:
> > > >> 2 Datanodes: 1000 random read: 1.757 sec
> > > >> 3 datanodes: 1000 random read: 1.7 sec
> > > >>
> > > >> So any help plzzz ??
> > > >>
> > > >>
> > >
> > >
> >
>