|
y_823910@...
2010-01-04, 07:55
stack
2010-01-05, 06:44
stack
2010-01-05, 06:46
y_823910@...
2010-01-05, 06:49
y_823910@...
2010-01-05, 07:13
stack
2010-01-05, 07:21
y_823910@...
2010-01-05, 07:53
y_823910@...
2010-01-07, 06:55
Jean-Daniel Cryans
2010-01-07, 19:12
|
-
HBase reading testy_823910@... 2010-01-04, 07:55
Hi,
There are 2 region servers(2G memory), 5 data nodes in my cluster. I want to test HBase reading performance by writing a program with Hbase client. Inside that codes, I was using secondary index to scan the data I need, that took 80 sec to fetch 5243 rows that was very cool! Then I tried to deploy that program to another two machines, trying to test hbase ability of handling concurrent clients'reading. Each client fetch the same data(5243 rows) The Result is like following: 1 concurrent client read: 80 sec 2 concurrent client read: 104 sec 3 concurrent client read: 232 sec As above, increasing more concurrent client reading connections seems to lower hbase performance too much. Any opinions? Fleming Chiu(嚙踝蕭嚙踝蕭嚙踝蕭) 707-6128 [EMAIL PROTECTED] 嚙篇嚙瑾嚙盤嚙論歹蕭Y嚙踝蕭嚙誕地嚙緙(Meat Free Monday Taiwan) --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---------------------------------------------------------------------------
-
Re: HBase reading teststack 2010-01-05, 06:44
2010/1/3 <[EMAIL PROTECTED]>
> Each client fetch the same data(5243 rows) > The Result is like following: > 1 concurrent client read: 80 sec > 2 concurrent client read: 104 sec > 3 concurrent client read: 232 sec > As above, increasing more concurrent client reading connections seems to > lower hbase performance too much. > Any opinions? > > Clients were all running in a single process? If so, try running them as distinct processes. St.Ack
-
Re: HBase reading teststack 2010-01-05, 06:46
My guess is that you have too little data. Try adding 500k rows. What is
your schema like? What size is your data? St.Ack On Mon, Jan 4, 2010 at 10:44 PM, stack <[EMAIL PROTECTED]> wrote: > 2010/1/3 <[EMAIL PROTECTED]> > >> Each client fetch the same data(5243 rows) >> The Result is like following: >> 1 concurrent client read: 80 sec >> 2 concurrent client read: 104 sec >> 3 concurrent client read: 232 sec >> As above, increasing more concurrent client reading connections seems to >> lower hbase performance too much. >> Any opinions? >> >> > Clients were all running in a single process? If so, try running them as > distinct processes. > St.Ack >
-
Re: HBase reading testy_823910@... 2010-01-05, 06:49
No, I dispatched that program to three different machines. 2010/1/3 <[EMAIL PROTECTED]> > Each client fetch the same data(5243 rows) > The Result is like following: > 1 concurrent client read: 80 sec > 2 concurrent client read: 104 sec > 3 concurrent client read: 232 sec > As above, increasing more concurrent client reading connections seems to > lower hbase performance too much. > Any opinions? > > Clients were all running in a single process? If so, try running them as distinct processes. St.Ack --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---------------------------------------------------------------------------
-
Re: HBase reading testy_823910@... 2010-01-05, 07:13
Our data size is about 6G and more 500k rows.
The schema we created is that only two column family and a few qualifiers(keep oracle columns) We are going to fire thousands of clients to fetch data from HBase. It became so slow even when we only increased to 3 clients. Trying to scale-out our region server to 4 , unfortunatly, it worst than before. Does it work if I set handler.count to 20 <property> <name>hbase.regionserver.handler.count</name> <value>10</value> <description>Count of RPC Server instances spun up on RegionServers Same property is used by the HMaster for count of master handlers. Default is 10. </description> </property> Fleming Chiu(嚙踝蕭嚙踝蕭嚙踝蕭) 707-6128 [EMAIL PROTECTED] 嚙篇嚙瑾嚙盤嚙論歹蕭Y嚙踝蕭嚙誕地嚙緙(Meat Free Monday Taiwan) saint.ack@gmail.c om To: [EMAIL PROTECTED] Sent by: cc: (bcc: Y_823910/TSMC) saint.ack@gmail.c Subject: Re: HBase reading test om 2010/01/05 02:46 PM Please respond to hbase-user My guess is that you have too little data. Try adding 500k rows. What is your schema like? What size is your data? St.Ack On Mon, Jan 4, 2010 at 10:44 PM, stack <[EMAIL PROTECTED]> wrote: > 2010/1/3 <[EMAIL PROTECTED]> > >> Each client fetch the same data(5243 rows) >> The Result is like following: >> 1 concurrent client read: 80 sec >> 2 concurrent client read: 104 sec >> 3 concurrent client read: 232 sec >> As above, increasing more concurrent client reading connections seems to >> lower hbase performance too much. >> Any opinions? >> >> > Clients were all running in a single process? If so, try running them as > distinct processes. > St.Ack > --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---------------------------------------------------------------------------
-
Re: HBase reading teststack 2010-01-05, 07:21
Well if only 3 counts, its probably not handler count, though, yes on a
loaded cluster, you should up the handlers all around (in hbase and in hdfs). Check out the performance page on the wiki. Anything there that can help? 3 clients have this much trouble is a bit odd going by folks experience. See if you can figure where the time is being spent? Thanks, St.Ack 2010/1/4 <[EMAIL PROTECTED]> > Our data size is about 6G and more 500k rows. > The schema we created is that only two column family and a few > qualifiers(keep oracle columns) > We are going to fire thousands of clients to fetch data from HBase. > It became so slow even when we only increased to 3 clients. > Trying to scale-out our region server to 4 , unfortunatly, it worst than > before. > Does it work if I set handler.count to 20 > <property> > <name>hbase.regionserver.handler.count</name> > <value>10</value> > <description>Count of RPC Server instances spun up on RegionServers > Same property is used by the HMaster for count of master handlers. > Default is 10. > </description> > </property> > > > Fleming Chiu(邱宏明) > 707-6128 > [EMAIL PROTECTED] > 週一無肉日吃素救地球(Meat Free Monday Taiwan) > > > > > > saint.ack@gmail.c > om To: > [EMAIL PROTECTED] > Sent by: cc: (bcc: Y_823910/TSMC) > saint.ack@gmail.c Subject: Re: HBase reading > test > om > > > 2010/01/05 02:46 > PM > Please respond to > hbase-user > > > > > > > My guess is that you have too little data. Try adding 500k rows. What is > your schema like? What size is your data? > St.Ack > > On Mon, Jan 4, 2010 at 10:44 PM, stack <[EMAIL PROTECTED]> wrote: > > > 2010/1/3 <[EMAIL PROTECTED]> > > > >> Each client fetch the same data(5243 rows) > >> The Result is like following: > >> 1 concurrent client read: 80 sec > >> 2 concurrent client read: 104 sec > >> 3 concurrent client read: 232 sec > >> As above, increasing more concurrent client reading connections seems to > >> lower hbase performance too much. > >> Any opinions? > >> > >> > > Clients were all running in a single process? If so, try running them as > > distinct processes. > > St.Ack > > > > > > > > --------------------------------------------------------------------------- > TSMC PROPERTY > This email communication (and any attachments) is proprietary information > for the sole use of its > intended recipient. Any unauthorized review, use or distribution by anyone > other than the intended > recipient is strictly prohibited. If you are not the intended recipient, > please notify the sender by > replying to this email, and then delete this email and any copies of it > immediately. Thank you. > > --------------------------------------------------------------------------- > > > >
-
Re: HBase reading testy_823910@... 2010-01-05, 07:53
My reading steps like following.
Previous results are the next scanning condition. That became so slower is due to multiple users scan the index table ? Anyone experienced this? (Multiple users concurrent scan the same data will slower hbase performance) One index value | |scan Table1 --- Table1-idxColumn | | Results | |scan Table2 --- Table2-idxColumn | | Results . .scan . . . . Table5 --- Table5-idxColumn Fleming Chiu(嚙踝蕭嚙踝蕭嚙踝蕭) 707-6128 [EMAIL PROTECTED] 嚙篇嚙瑾嚙盤嚙論歹蕭Y嚙踝蕭嚙誕地嚙緙(Meat Free Monday Taiwan) saint.ack@gmail.c om To: [EMAIL PROTECTED] Sent by: cc: (bcc: Y_823910/TSMC) saint.ack@gmail.c Subject: Re: HBase reading test om 2010/01/05 03:21 PM Please respond to hbase-user Well if only 3 counts, its probably not handler count, though, yes on a loaded cluster, you should up the handlers all around (in hbase and in hdfs). Check out the performance page on the wiki. Anything there that can help? 3 clients have this much trouble is a bit odd going by folks experience. See if you can figure where the time is being spent? Thanks, St.Ack 2010/1/4 <[EMAIL PROTECTED]> > Our data size is about 6G and more 500k rows. > The schema we created is that only two column family and a few > qualifiers(keep oracle columns) > We are going to fire thousands of clients to fetch data from HBase. > It became so slow even when we only increased to 3 clients. > Trying to scale-out our region server to 4 , unfortunatly, it worst than > before. > Does it work if I set handler.count to 20 > <property> > <name>hbase.regionserver.handler.count</name> > <value>10</value> > <description>Count of RPC Server instances spun up on RegionServers > Same property is used by the HMaster for count of master handlers. > Default is 10. > </description> > </property> > > > Fleming Chiu(嚙踝蕭嚙踝蕭嚙踝蕭) > 707-6128 > [EMAIL PROTECTED] > 嚙篇嚙瑾嚙盤嚙論歹蕭Y嚙踝蕭嚙誕地嚙緙(Meat Free Monday Taiwan) > > > > > > saint.ack@gmail.c Y_823910/TSMC) is to as information anyone recipient, --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---------------------------------------------------------------------------
-
Re: HBase reading testy_823910@... 2010-01-07, 06:55
Hi,
I've found the root cause of why multiple reading users lower the hbase performance. That's because I always new a HTable in a share function, which will make the region server with meta information being very busy! After update following code, the reading performance is fantastic. 1 concurrent client read: 27 sec 2 concurrent client read: 28 sec 4 concurrent client read: 36 sec public Vector<String> ScanHBase(String tablename,String columnfamily,String KeyColumn,String StartKeyValue,String StopKeyValue) throws IOException { HTable table = new HTable(config, tablename); //-- bad writing . . . } 2010/1/3 <[EMAIL PROTECTED]> > Each client fetch the same data(5243 rows) > The Result is like following: > 1 concurrent client read: 80 sec > 2 concurrent client read: 104 sec > 3 concurrent client read: 232 sec > As above, increasing more concurrent client reading connections seems to > lower hbase performance too much. > Any opinions? > > Clients were all running in a single process? If so, try running them as distinct processes. St.Ack Fleming Chiu(嚙踝蕭嚙踝蕭嚙踝蕭) 707-6128 [EMAIL PROTECTED] 嚙篇嚙瑾嚙盤嚙論歹蕭Y嚙踝蕭嚙誕地嚙緙(Meat Free Monday Taiwan) --------------------------------------------------------------------------- TSMC PROPERTY This email communication (and any attachments) is proprietary information for the sole use of its intended recipient. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please notify the sender by replying to this email, and then delete this email and any copies of it immediately. Thank you. ---------------------------------------------------------------------------
-
Re: HBase reading testJean-Daniel Cryans 2010-01-07, 19:12
Yeah instantiating a HTable is very expensive since it pings .META.
once, glad you could resolve your issue! J-D 2010/1/6 <[EMAIL PROTECTED]>: > Hi, > > I've found the root cause of why multiple reading users lower the hbase > performance. > That's because I always new a HTable in a share function, which will make > the region server with meta information being > very busy! > After update following code, the reading performance is fantastic. > 1 concurrent client read: 27 sec > 2 concurrent client read: 28 sec > 4 concurrent client read: 36 sec > > public Vector<String> ScanHBase(String tablename,String columnfamily,String > KeyColumn,String StartKeyValue,String StopKeyValue) throws IOException { > HTable table = new HTable(config, tablename); //-- bad writing > > . > . > . > } > > 2010/1/3 <[EMAIL PROTECTED]> > > > Each client fetch the same data(5243 rows) > > The Result is like following: > > 1 concurrent client read: 80 sec > > 2 concurrent client read: 104 sec > > 3 concurrent client read: 232 sec > > As above, increasing more concurrent client > reading connections seems to > > lower hbase performance too much. > > Any opinions? > > > > > Clients were all running in a single process? > If so, try running them as > distinct processes. > St.Ack > > > > > > > > Fleming Chiu(邱宏明) > 707-6128 > [EMAIL PROTECTED] > 週一無肉日吃素救地球(Meat Free Monday Taiwan) > > > --------------------------------------------------------------------------- > TSMC PROPERTY > This email communication (and any attachments) is proprietary information > for the sole use of its > intended recipient. Any unauthorized review, use or distribution by anyone > other than the intended > recipient is strictly prohibited. If you are not the intended recipient, > please notify the sender by > replying to this email, and then delete this email and any copies of it > immediately. Thank you. > --------------------------------------------------------------------------- > > > > |