|
Dalia Sobhy
2012-12-22, 15:43
Ted Yu
2012-12-22, 16:06
Michael Segel
2012-12-22, 16:23
Varun Sharma
2012-12-22, 16:50
Mohit Anchlia
2012-12-22, 16:54
Mohammad Tariq
2012-12-22, 17:39
Dalia Sobhy
2012-12-23, 13:38
Dalia Sobhy
2012-12-23, 13:42
Dalia Sobhy
2012-12-23, 13:44
Dimitry Goldin
2012-12-23, 13:57
Mohammad Tariq
2012-12-23, 22:05
|
-
Hbase scalability performanceDalia Sobhy 2012-12-22, 15:43
Dear all,
I am testing a simple hbase application on a cluster of multiple nodes. I am especially testing the scalability performance, by measuring the time taken for random reads Data size: 200,000 row Row key : 0,1,2 very simple row key incremental But i don't know why by increasing the cluster size, I see the same time. For ex: 2 Datanodes: 1000 random read: 1.757 sec 3 datanodes: 1000 random read: 1.7 sec So any help plzzz ??
-
Re: Hbase scalability performanceTed Yu 2012-12-22, 16:06
By '3 datanodes', did you mean that you also increased the number of region
servers to 3 ? When your test was running, did you look at Web UI to see whether load was balanced ? You can also use Ganglia for such purpose. What version of HBase are you using ? Thanks On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > Dear all, > > I am testing a simple hbase application on a cluster of multiple nodes. > > I am especially testing the scalability performance, by measuring the time > taken for random reads > > Data size: 200,000 row > Row key : 0,1,2 very simple row key incremental > > But i don't know why by increasing the cluster size, I see the same time. > > For ex: > 2 Datanodes: 1000 random read: 1.757 sec > 3 datanodes: 1000 random read: 1.7 sec > > So any help plzzz ?? > >
-
Re: Hbase scalability performanceMichael Segel 2012-12-22, 16:23
I thought it was Doug Miel who said that HBase doesn't start to shine until you had at least 5 nodes.
(Apologies if I misspelled Doug's name.) I happen to concur and if you want to start testing scalability, you will want to build a bigger test rig. Just saying! Oh and you're going to have a hot spot on that row key. Maybe do a hashed UUID ? I would suggest that you consider the following: Create N number of rows... where N is a very large number of rows. Then to generate your random access, do a full table scan to get the N row keys in to memory. Using a random number generator, generate a random number and pop that row off the stack so that the next iteration is between 1 and (N-1). Do this 200K times. Now time your 200K random fetches. It would be interesting to see how it performs getting an average of a 'couple' of runs... then increase the key space by an order of magnitude. (Start w 1 million rows, 10 million rows, 100 million rows.... ) In theory... if properly tuned. One should expect near linear results . That is to say the time it takes to get() a row across the data space should be consistent. Although I wonder if you would have to somehow clear the cache? Sorry, just a random thought... -Mike On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > By '3 datanodes', did you mean that you also increased the number of region > servers to 3 ? > > When your test was running, did you look at Web UI to see whether load was > balanced ? You can also use Ganglia for such purpose. > > What version of HBase are you using ? > > Thanks > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > >> Dear all, >> >> I am testing a simple hbase application on a cluster of multiple nodes. >> >> I am especially testing the scalability performance, by measuring the time >> taken for random reads >> >> Data size: 200,000 row >> Row key : 0,1,2 very simple row key incremental >> >> But i don't know why by increasing the cluster size, I see the same time. >> >> For ex: >> 2 Datanodes: 1000 random read: 1.757 sec >> 3 datanodes: 1000 random read: 1.7 sec >> >> So any help plzzz ?? >> >>
-
Re: Hbase scalability performanceVarun Sharma 2012-12-22, 16:50
Note that adding nodes will improve throughput and not latency. So, if your
client application for benchmarking is single threaded, do not expect an improvement in number of reads per second by just adding nodes. On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel <[EMAIL PROTECTED]>wrote: > I thought it was Doug Miel who said that HBase doesn't start to shine > until you had at least 5 nodes. > (Apologies if I misspelled Doug's name.) > > I happen to concur and if you want to start testing scalability, you will > want to build a bigger test rig. > > Just saying! > > > Oh and you're going to have a hot spot on that row key. > Maybe do a hashed UUID ? > > I would suggest that you consider the following: > > Create N number of rows... where N is a very large number of rows. > Then to generate your random access, do a full table scan to get the N row > keys in to memory. > Using a random number generator, generate a random number and pop that > row off the stack so that the next iteration is between 1 and (N-1). > Do this 200K times. > > Now time your 200K random fetches. > > It would be interesting to see how it performs getting an average of a > 'couple' of runs... then increase the key space by an order of magnitude. > (Start w 1 million rows, 10 million rows, 100 million rows.... ) > > In theory... if properly tuned. One should expect near linear results . > That is to say the time it takes to get() a row across the data space > should be consistent. Although I wonder if you would have to somehow clear > the cache? > > > Sorry, just a random thought... > > -Mike > > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > By '3 datanodes', did you mean that you also increased the number of > region > > servers to 3 ? > > > > When your test was running, did you look at Web UI to see whether load > was > > balanced ? You can also use Ganglia for such purpose. > > > > What version of HBase are you using ? > > > > Thanks > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <[EMAIL PROTECTED] > >wrote: > > > >> Dear all, > >> > >> I am testing a simple hbase application on a cluster of multiple nodes. > >> > >> I am especially testing the scalability performance, by measuring the > time > >> taken for random reads > >> > >> Data size: 200,000 row > >> Row key : 0,1,2 very simple row key incremental > >> > >> But i don't know why by increasing the cluster size, I see the same > time. > >> > >> For ex: > >> 2 Datanodes: 1000 random read: 1.757 sec > >> 3 datanodes: 1000 random read: 1.7 sec > >> > >> So any help plzzz ?? > >> > >> > >
-
Re: Hbase scalability performanceMohit Anchlia 2012-12-22, 16:54
Also, check how balanced your region servers are accross all the nodes
On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Note that adding nodes will improve throughput and not latency. So, if your > client application for benchmarking is single threaded, do not expect an > improvement in number of reads per second by just adding nodes. > > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel <[EMAIL PROTECTED] > >wrote: > > > I thought it was Doug Miel who said that HBase doesn't start to shine > > until you had at least 5 nodes. > > (Apologies if I misspelled Doug's name.) > > > > I happen to concur and if you want to start testing scalability, you will > > want to build a bigger test rig. > > > > Just saying! > > > > > > Oh and you're going to have a hot spot on that row key. > > Maybe do a hashed UUID ? > > > > I would suggest that you consider the following: > > > > Create N number of rows... where N is a very large number of rows. > > Then to generate your random access, do a full table scan to get the N > row > > keys in to memory. > > Using a random number generator, generate a random number and pop that > > row off the stack so that the next iteration is between 1 and (N-1). > > Do this 200K times. > > > > Now time your 200K random fetches. > > > > It would be interesting to see how it performs getting an average of a > > 'couple' of runs... then increase the key space by an order of magnitude. > > (Start w 1 million rows, 10 million rows, 100 million rows.... ) > > > > In theory... if properly tuned. One should expect near linear results . > > That is to say the time it takes to get() a row across the data space > > should be consistent. Although I wonder if you would have to somehow > clear > > the cache? > > > > > > Sorry, just a random thought... > > > > -Mike > > > > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > By '3 datanodes', did you mean that you also increased the number of > > region > > > servers to 3 ? > > > > > > When your test was running, did you look at Web UI to see whether load > > was > > > balanced ? You can also use Ganglia for such purpose. > > > > > > What version of HBase are you using ? > > > > > > Thanks > > > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy < > [EMAIL PROTECTED] > > >wrote: > > > > > >> Dear all, > > >> > > >> I am testing a simple hbase application on a cluster of multiple > nodes. > > >> > > >> I am especially testing the scalability performance, by measuring the > > time > > >> taken for random reads > > >> > > >> Data size: 200,000 row > > >> Row key : 0,1,2 very simple row key incremental > > >> > > >> But i don't know why by increasing the cluster size, I see the same > > time. > > >> > > >> For ex: > > >> 2 Datanodes: 1000 random read: 1.757 sec > > >> 3 datanodes: 1000 random read: 1.7 sec > > >> > > >> So any help plzzz ?? > > >> > > >> > > > > >
-
Re: Hbase scalability performanceMohammad Tariq 2012-12-22, 17:39
I totally agree with Michael. I was about to point out the same thing.
Probability of RS hotspotting is high when we have sequential keys. Even if everything is balanced and your cluster is very well configured you might end up with this issue. Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sat, Dec 22, 2012 at 10:24 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > Also, check how balanced your region servers are accross all the nodes > > On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > Note that adding nodes will improve throughput and not latency. So, if > your > > client application for benchmarking is single threaded, do not expect an > > improvement in number of reads per second by just adding nodes. > > > > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel < > [EMAIL PROTECTED] > > >wrote: > > > > > I thought it was Doug Miel who said that HBase doesn't start to shine > > > until you had at least 5 nodes. > > > (Apologies if I misspelled Doug's name.) > > > > > > I happen to concur and if you want to start testing scalability, you > will > > > want to build a bigger test rig. > > > > > > Just saying! > > > > > > > > > Oh and you're going to have a hot spot on that row key. > > > Maybe do a hashed UUID ? > > > > > > I would suggest that you consider the following: > > > > > > Create N number of rows... where N is a very large number of rows. > > > Then to generate your random access, do a full table scan to get the N > > row > > > keys in to memory. > > > Using a random number generator, generate a random number and pop that > > > row off the stack so that the next iteration is between 1 and (N-1). > > > Do this 200K times. > > > > > > Now time your 200K random fetches. > > > > > > It would be interesting to see how it performs getting an average of a > > > 'couple' of runs... then increase the key space by an order of > magnitude. > > > (Start w 1 million rows, 10 million rows, 100 million rows.... ) > > > > > > In theory... if properly tuned. One should expect near linear results . > > > That is to say the time it takes to get() a row across the data space > > > should be consistent. Although I wonder if you would have to somehow > > clear > > > the cache? > > > > > > > > > Sorry, just a random thought... > > > > > > -Mike > > > > > > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > By '3 datanodes', did you mean that you also increased the number of > > > region > > > > servers to 3 ? > > > > > > > > When your test was running, did you look at Web UI to see whether > load > > > was > > > > balanced ? You can also use Ganglia for such purpose. > > > > > > > > What version of HBase are you using ? > > > > > > > > Thanks > > > > > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > >> Dear all, > > > >> > > > >> I am testing a simple hbase application on a cluster of multiple > > nodes. > > > >> > > > >> I am especially testing the scalability performance, by measuring > the > > > time > > > >> taken for random reads > > > >> > > > >> Data size: 200,000 row > > > >> Row key : 0,1,2 very simple row key incremental > > > >> > > > >> But i don't know why by increasing the cluster size, I see the same > > > time. > > > >> > > > >> For ex: > > > >> 2 Datanodes: 1000 random read: 1.757 sec > > > >> 3 datanodes: 1000 random read: 1.7 sec > > > >> > > > >> So any help plzzz ?? > > > >> > > > >> > > > > > > > > >
-
RE: Hbase scalability performanceDalia Sobhy 2012-12-23, 13:38
So do you have an example of multithreading program, because I am using the read-made Java API not thrift server, so I don't know how to write a multithreaded program using this API. > Date: Sat, 22 Dec 2012 08:50:56 -0800 > Subject: Re: Hbase scalability performance > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > Note that adding nodes will improve throughput and not latency. So, if your > client application for benchmarking is single threaded, do not expect an > improvement in number of reads per second by just adding nodes. > > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel <[EMAIL PROTECTED]>wrote: > > > I thought it was Doug Miel who said that HBase doesn't start to shine > > until you had at least 5 nodes. > > (Apologies if I misspelled Doug's name.) > > > > I happen to concur and if you want to start testing scalability, you will > > want to build a bigger test rig. > > > > Just saying! > > > > > > Oh and you're going to have a hot spot on that row key. > > Maybe do a hashed UUID ? > > > > I would suggest that you consider the following: > > > > Create N number of rows... where N is a very large number of rows. > > Then to generate your random access, do a full table scan to get the N row > > keys in to memory. > > Using a random number generator, generate a random number and pop that > > row off the stack so that the next iteration is between 1 and (N-1). > > Do this 200K times. > > > > Now time your 200K random fetches. > > > > It would be interesting to see how it performs getting an average of a > > 'couple' of runs... then increase the key space by an order of magnitude. > > (Start w 1 million rows, 10 million rows, 100 million rows.... ) > > > > In theory... if properly tuned. One should expect near linear results . > > That is to say the time it takes to get() a row across the data space > > should be consistent. Although I wonder if you would have to somehow clear > > the cache? > > > > > > Sorry, just a random thought... > > > > -Mike > > > > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > By '3 datanodes', did you mean that you also increased the number of > > region > > > servers to 3 ? > > > > > > When your test was running, did you look at Web UI to see whether load > > was > > > balanced ? You can also use Ganglia for such purpose. > > > > > > What version of HBase are you using ? > > > > > > Thanks > > > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <[EMAIL PROTECTED] > > >wrote: > > > > > >> Dear all, > > >> > > >> I am testing a simple hbase application on a cluster of multiple nodes. > > >> > > >> I am especially testing the scalability performance, by measuring the > > time > > >> taken for random reads > > >> > > >> Data size: 200,000 row > > >> Row key : 0,1,2 very simple row key incremental > > >> > > >> But i don't know why by increasing the cluster size, I see the same > > time. > > >> > > >> For ex: > > >> 2 Datanodes: 1000 random read: 1.757 sec > > >> 3 datanodes: 1000 random read: 1.7 sec > > >> > > >> So any help plzzz ?? > > >> > > >> > > > >
-
RE: Hbase scalability performanceDalia Sobhy 2012-12-23, 13:42
Dear all, Thanks for your help. I am already using coprocessors for this table. I already tried a program similar to it but using thrift server and my cluster was 23 nodes on Rackspace cloud, but the same I didn't see any improved performance. Then I was advised to use actual machines (not virtual ones), and greater bandwidth than 100Mbps. They told me those two issues caused this performance. But upon trial, I found the same case. > From: [EMAIL PROTECTED] > Date: Sat, 22 Dec 2012 23:09:54 +0530 > Subject: Re: Hbase scalability performance > To: [EMAIL PROTECTED] > > I totally agree with Michael. I was about to point out the same thing. > Probability of RS hotspotting is high when we have sequential keys. Even if > everything is balanced and your cluster is very well configured you might > end up with this issue. > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Sat, Dec 22, 2012 at 10:24 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > > > Also, check how balanced your region servers are accross all the nodes > > > > On Sat, Dec 22, 2012 at 8:50 AM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > > > > Note that adding nodes will improve throughput and not latency. So, if > > your > > > client application for benchmarking is single threaded, do not expect an > > > improvement in number of reads per second by just adding nodes. > > > > > > On Sat, Dec 22, 2012 at 8:23 AM, Michael Segel < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > I thought it was Doug Miel who said that HBase doesn't start to shine > > > > until you had at least 5 nodes. > > > > (Apologies if I misspelled Doug's name.) > > > > > > > > I happen to concur and if you want to start testing scalability, you > > will > > > > want to build a bigger test rig. > > > > > > > > Just saying! > > > > > > > > > > > > Oh and you're going to have a hot spot on that row key. > > > > Maybe do a hashed UUID ? > > > > > > > > I would suggest that you consider the following: > > > > > > > > Create N number of rows... where N is a very large number of rows. > > > > Then to generate your random access, do a full table scan to get the N > > > row > > > > keys in to memory. > > > > Using a random number generator, generate a random number and pop that > > > > row off the stack so that the next iteration is between 1 and (N-1). > > > > Do this 200K times. > > > > > > > > Now time your 200K random fetches. > > > > > > > > It would be interesting to see how it performs getting an average of a > > > > 'couple' of runs... then increase the key space by an order of > > magnitude. > > > > (Start w 1 million rows, 10 million rows, 100 million rows.... ) > > > > > > > > In theory... if properly tuned. One should expect near linear results . > > > > That is to say the time it takes to get() a row across the data space > > > > should be consistent. Although I wonder if you would have to somehow > > > clear > > > > the cache? > > > > > > > > > > > > Sorry, just a random thought... > > > > > > > > -Mike > > > > > > > > On Dec 22, 2012, at 10:06 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > > > By '3 datanodes', did you mean that you also increased the number of > > > > region > > > > > servers to 3 ? > > > > > > > > > > When your test was running, did you look at Web UI to see whether > > load > > > > was > > > > > balanced ? You can also use Ganglia for such purpose. > > > > > > > > > > What version of HBase are you using ? > > > > > > > > > > Thanks > > > > > > > > > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy < > > > [EMAIL PROTECTED] > > > > >wrote: > > > > > > > > > >> Dear all, > > > > >> > > > > >> I am testing a simple hbase application on a cluster of multiple > > > nodes. > > > > >> > > > > >> I am especially testing the scalability performance, by measuring > > the > > > > time > > > > >> taken for random reads > > > > >> > > > > >> Data size: 200,000 row > > > > >> Row key : 0,1,2 very simple row key incremental
-
RE: Hbase scalability performanceDalia Sobhy 2012-12-23, 13:44
I am using 3 region servers. Hbase version: 0.92 Cloudera Manager: 4.1 How to know the load is balanced Ted? > Date: Sat, 22 Dec 2012 08:06:59 -0800 > Subject: Re: Hbase scalability performance > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > > By '3 datanodes', did you mean that you also increased the number of region > servers to 3 ? > > When your test was running, did you look at Web UI to see whether load was > balanced ? You can also use Ganglia for such purpose. > > What version of HBase are you using ? > > Thanks > > On Sat, Dec 22, 2012 at 7:43 AM, Dalia Sobhy <[EMAIL PROTECTED]>wrote: > > > Dear all, > > > > I am testing a simple hbase application on a cluster of multiple nodes. > > > > I am especially testing the scalability performance, by measuring the time > > taken for random reads > > > > Data size: 200,000 row > > Row key : 0,1,2 very simple row key incremental > > > > But i don't know why by increasing the cluster size, I see the same time. > > > > For ex: > > 2 Datanodes: 1000 random read: 1.757 sec > > 3 datanodes: 1000 random read: 1.7 sec > > > > So any help plzzz ?? > > > >
-
Re: Hbase scalability performanceDimitry Goldin 2012-12-23, 13:57
Hi,
On 23.12.2012 14:38, Dalia Sobhy wrote: > > So do you have an example of multithreading program, because I am using the read-made Java API not thrift server, so I don't know how to write a multithreaded program using this API. You should take a loot at YCSB (https://github.com/brianfrankcooper/YCSB), maybe one of the premade workloads fits your scenario. Cheers
-
Re: Hbase scalability performanceMohammad Tariq 2012-12-23, 22:05
Hello Dalia,
You can go the Hbase webUI to see the details, as Ted has specified earlier. But if you really want to monitor everything properly I would suggest to configure Ganglia to capture the metrics. To do a quick check you can also use "status" command from the Hbase shell. hbase> status hbase> status 'simple' hbase> status 'summary' hbase> status 'detailed' HTH Best Regards, Tariq +91-9741563634 https://mtariq.jux.com/ On Sun, Dec 23, 2012 at 7:27 PM, Dimitry Goldin <[EMAIL PROTECTED]> wrote: > Hi, > > > On 23.12.2012 14:38, Dalia Sobhy wrote: > >> >> So do you have an example of multithreading program, because I am using >> the read-made Java API not thrift server, so I don't know how to write a >> multithreaded program using this API. >> > > You should take a loot at YCSB (https://github.com/**brianfrankcooper/YCSB<https://github.com/brianfrankcooper/YCSB>), > maybe one of the premade workloads fits your scenario. > > Cheers > > |