|
Jinsong Hu
2010-08-27, 17:02
Jonathan Gray
2010-08-27, 17:55
Ryan Rawson
2010-08-28, 20:05
Scott Whitecross
2010-08-29, 04:00
Jinsong Hu
2010-09-01, 18:21
Jean-Daniel Cryans
2010-09-01, 18:35
Jinsong Hu
2010-09-01, 19:10
Jonathan Gray
2010-09-01, 19:17
Scott Whitecross
2010-09-01, 19:56
|
-
how many regions a regionserver can supportJinsong Hu 2010-08-27, 17:02
Hi, There :
Does anybody know how many region a regionserver can support ? I have regionservers with 8G ram and 1.5T disk and 4 core CPU. I searched http://www.facebook.com/note.php?note_id=142473677002 and they say google target is 100 regions of 200M for each regionserver. In my case, I have 2700 regions spread to 6 regionservers. each region is set to default size of 256M . and it seems it is still running fine. I am running CDH3. I just wonder what is the upper limit so that I can do capacity planning. Does anybody know this ? Jimmy.
-
RE: how many regions a regionserver can supportJonathan Gray 2010-08-27, 17:55
There is no fixed limit, it has much more to do with the read/write load than the actual dataset size.
HBase is usually fine having very densely packed RegionServers, if much of the data is rarely accessed. If you have extremely high numbers of regions per server and you are writing to all of these regions, or even reading from all of them, you could have issues. Though storage capacity needs to be considered, capacity planning often has much more to do with how much memory you need to support the read/write load you expect. Reads mostly from a performance POV but for writes, there are some important considerations related to the number of regions per server (and thus data density and determining your max region size). In any case, you should probably increase your max size to 1GB or so and can go higher if necessary. JG > -----Original Message----- > From: Jinsong Hu [mailto:[EMAIL PROTECTED]] > Sent: Friday, August 27, 2010 10:03 AM > To: [EMAIL PROTECTED] > Subject: how many regions a regionserver can support > > Hi, There : > Does anybody know how many region a regionserver can support ? I > have > regionservers with 8G ram and 1.5T disk and 4 core CPU. > I searched http://www.facebook.com/note.php?note_id=142473677002 and > they > say google target is 100 regions of 200M for each > regionserver. > In my case, I have 2700 regions spread to 6 regionservers. each > region is > set to default size of 256M . and it seems it is still running fine. I > am > running CDH3. I just wonder what is the upper limit so that I can do > capacity planning. Does anybody know this ? > > Jimmy.
-
Re: how many regions a regionserver can supportRyan Rawson 2010-08-28, 20:05
The only downside to having so many regions on a regionserver is
opening and reassigning them is not as fast as you'd like. In the future with increased parallelism and other things we will be able to speed it up. On Fri, Aug 27, 2010 at 10:55 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > There is no fixed limit, it has much more to do with the read/write load than the actual dataset size. > > HBase is usually fine having very densely packed RegionServers, if much of the data is rarely accessed. If you have extremely high numbers of regions per server and you are writing to all of these regions, or even reading from all of them, you could have issues. Though storage capacity needs to be considered, capacity planning often has much more to do with how much memory you need to support the read/write load you expect. Reads mostly from a performance POV but for writes, there are some important considerations related to the number of regions per server (and thus data density and determining your max region size). > > In any case, you should probably increase your max size to 1GB or so and can go higher if necessary. > > JG > >> -----Original Message----- >> From: Jinsong Hu [mailto:[EMAIL PROTECTED]] >> Sent: Friday, August 27, 2010 10:03 AM >> To: [EMAIL PROTECTED] >> Subject: how many regions a regionserver can support >> >> Hi, There : >> Does anybody know how many region a regionserver can support ? I >> have >> regionservers with 8G ram and 1.5T disk and 4 core CPU. >> I searched http://www.facebook.com/note.php?note_id=142473677002 and >> they >> say google target is 100 regions of 200M for each >> regionserver. >> In my case, I have 2700 regions spread to 6 regionservers. each >> region is >> set to default size of 256M . and it seems it is still running fine. I >> am >> running CDH3. I just wonder what is the upper limit so that I can do >> capacity planning. Does anybody know this ? >> >> Jimmy. > >
-
Re: how many regions a regionserver can supportScott Whitecross 2010-08-29, 04:00
Can you explain this a bit more? I thought one benefit to small regions was
increased performance pulling blocks? Is there an upper limit to the number of regions per region server that is recommended? On Sat, Aug 28, 2010 at 4:05 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > The only downside to having so many regions on a regionserver is > opening and reassigning them is not as fast as you'd like. > > In the future with increased parallelism and other things we will be > able to speed it up. > > > > On Fri, Aug 27, 2010 at 10:55 AM, Jonathan Gray <[EMAIL PROTECTED]> > wrote: > > There is no fixed limit, it has much more to do with the read/write load > than the actual dataset size. > > > > HBase is usually fine having very densely packed RegionServers, if much > of the data is rarely accessed. If you have extremely high numbers of > regions per server and you are writing to all of these regions, or even > reading from all of them, you could have issues. Though storage capacity > needs to be considered, capacity planning often has much more to do with how > much memory you need to support the read/write load you expect. Reads > mostly from a performance POV but for writes, there are some important > considerations related to the number of regions per server (and thus data > density and determining your max region size). > > > > In any case, you should probably increase your max size to 1GB or so and > can go higher if necessary. > > > > JG > > > >> -----Original Message----- > >> From: Jinsong Hu [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, August 27, 2010 10:03 AM > >> To: [EMAIL PROTECTED] > >> Subject: how many regions a regionserver can support > >> > >> Hi, There : > >> Does anybody know how many region a regionserver can support ? I > >> have > >> regionservers with 8G ram and 1.5T disk and 4 core CPU. > >> I searched http://www.facebook.com/note.php?note_id=142473677002 and > >> they > >> say google target is 100 regions of 200M for each > >> regionserver. > >> In my case, I have 2700 regions spread to 6 regionservers. each > >> region is > >> set to default size of 256M . and it seems it is still running fine. I > >> am > >> running CDH3. I just wonder what is the upper limit so that I can do > >> capacity planning. Does anybody know this ? > >> > >> Jimmy. > > > > >
-
Re: how many regions a regionserver can supportJinsong Hu 2010-09-01, 18:21
I did a testing with 6 regionserver cluster with a key design that spread
the incoming data to all regions. I noticed after pumping data for 3-4 days for about 3 TB data, one of the regionserver shuts down because of channel IO error. on a 3 regionserver cluster and same key design, the regionservers shuts down after only 45G data insertion. I notice that if the key is designed so that it doesn't spread to all regions, but only to small portion of regions and that portion of regions spread approximately evenly among all regionservers, then the HDFS size becomes the limit of the total number of regions that can be supported and I don't run into this IO issue. Can any body show us the actual example of the hbase data size and cluster size ? Jimmy. -------------------------------------------------- From: "Jonathan Gray" <[EMAIL PROTECTED]> Sent: Friday, August 27, 2010 10:55 AM To: <[EMAIL PROTECTED]> Subject: RE: how many regions a regionserver can support > There is no fixed limit, it has much more to do with the read/write load > than the actual dataset size. > > HBase is usually fine having very densely packed RegionServers, if much of > the data is rarely accessed. If you have extremely high numbers of > regions per server and you are writing to all of these regions, or even > reading from all of them, you could have issues. Though storage capacity > needs to be considered, capacity planning often has much more to do with > how much memory you need to support the read/write load you expect. Reads > mostly from a performance POV but for writes, there are some important > considerations related to the number of regions per server (and thus data > density and determining your max region size). > > In any case, you should probably increase your max size to 1GB or so and > can go higher if necessary. > > JG > >> -----Original Message----- >> From: Jinsong Hu [mailto:[EMAIL PROTECTED]] >> Sent: Friday, August 27, 2010 10:03 AM >> To: [EMAIL PROTECTED] >> Subject: how many regions a regionserver can support >> >> Hi, There : >> Does anybody know how many region a regionserver can support ? I >> have >> regionservers with 8G ram and 1.5T disk and 4 core CPU. >> I searched http://www.facebook.com/note.php?note_id=142473677002 and >> they >> say google target is 100 regions of 200M for each >> regionserver. >> In my case, I have 2700 regions spread to 6 regionservers. each >> region is >> set to default size of 256M . and it seems it is still running fine. I >> am >> running CDH3. I just wonder what is the upper limit so that I can do >> capacity planning. Does anybody know this ? >> >> Jimmy. > >
-
Re: how many regions a regionserver can supportJean-Daniel Cryans 2010-09-01, 18:35
Is that really a good test? Unless you are planning to write about 1TB
of new data per day into HBase I don't see how you are testing capacity, you're more likely testing how HBase can sustain a constant import of a lot of data. Regarding that, I'd be interested in knowing exactly the circumstances of the region server failure. Regarding real life example, one of our cluster has about 2.5TB of LZOed data (not sure about the raw size) according to dfs -du, on 20 nodes (FWIW). When trying to reach high density on your nodes, be sure to compress your data and set the split size bigger than the default of 256MB or you'll end up with too many regions. J-D On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <[EMAIL PROTECTED]> wrote: > I did a testing with 6 regionserver cluster with a key design that spread > the incoming data to all regions. > I noticed after pumping data for 3-4 days for about 3 TB data, one of the > regionserver shuts down because > of channel IO error. on a 3 regionserver cluster and same key design, the > regionservers shuts down after only > 45G data insertion. > > I notice that if the key is designed so that it doesn't spread to all > regions, but only to small portion of regions and that > portion of regions spread approximately evenly among all regionservers, then > the HDFS size becomes the limit of > the total number of regions that can be supported and I don't run into this > IO issue. > > Can any body show us the actual example of the hbase data size and cluster > size ? > > Jimmy. > > -------------------------------------------------- > From: "Jonathan Gray" <[EMAIL PROTECTED]> > Sent: Friday, August 27, 2010 10:55 AM > To: <[EMAIL PROTECTED]> > Subject: RE: how many regions a regionserver can support > >> There is no fixed limit, it has much more to do with the read/write load >> than the actual dataset size. >> >> HBase is usually fine having very densely packed RegionServers, if much of >> the data is rarely accessed. If you have extremely high numbers of regions >> per server and you are writing to all of these regions, or even reading from >> all of them, you could have issues. Though storage capacity needs to be >> considered, capacity planning often has much more to do with how much memory >> you need to support the read/write load you expect. Reads mostly from a >> performance POV but for writes, there are some important considerations >> related to the number of regions per server (and thus data density and >> determining your max region size). >> >> In any case, you should probably increase your max size to 1GB or so and >> can go higher if necessary. >> >> JG >> >>> -----Original Message----- >>> From: Jinsong Hu [mailto:[EMAIL PROTECTED]] >>> Sent: Friday, August 27, 2010 10:03 AM >>> To: [EMAIL PROTECTED] >>> Subject: how many regions a regionserver can support >>> >>> Hi, There : >>> Does anybody know how many region a regionserver can support ? I >>> have >>> regionservers with 8G ram and 1.5T disk and 4 core CPU. >>> I searched http://www.facebook.com/note.php?note_id=142473677002 and >>> they >>> say google target is 100 regions of 200M for each >>> regionserver. >>> In my case, I have 2700 regions spread to 6 regionservers. each >>> region is >>> set to default size of 256M . and it seems it is still running fine. I >>> am >>> running CDH3. I just wonder what is the upper limit so that I can do >>> capacity planning. Does anybody know this ? >>> >>> Jimmy. >> >> >
-
Re: how many regions a regionserver can supportJinsong Hu 2010-09-01, 19:10
Yes, I am indeed testing the sustained rate. the channel I/O exception shows
the I/O killed the regionserver. the data node side shows: 2010-08-28 23:46:27,854 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Ex ception in receiveBlock for block blk_7209586757797236713_2442298 java.io.Interr uptedIOException: Interruped while waiting for IO on channel java.nio.channels.S ocketChannel[connected local=/10.110.24.89:50010 remote=/10.110.24.89:42524]. 0 millis timeout left. the regionserver side shows: 2010-08-28 23:47:13,148 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream R esponseProcessor exception for block blk_7209586757797236713_2442298java.io.EOF Exception I agree that if the insertion rate is slower, we will support more data in hbase. In this case, I do want to stress test the hbase and see what is the limit. Our application continuously collects data from network and insert to hbase, and I want to see what happens during the extreme cases. it looks channel I/O doesn't become bottleneck under such stress test. dfs -dus shows we have 1.17 TB of data when one of the regionserver crashed. the data is gzip compressed as I found that gzip compression actually gives better writing rate. I may test larger region size later. Previous test with 2 GB also cause lots of I/O and finally hbase regionserver crashed too. Jimmy. -------------------------------------------------- From: "Jean-Daniel Cryans" <[EMAIL PROTECTED]> Sent: Wednesday, September 01, 2010 11:35 AM To: <[EMAIL PROTECTED]> Subject: Re: how many regions a regionserver can support > Is that really a good test? Unless you are planning to write about 1TB > of new data per day into HBase I don't see how you are testing > capacity, you're more likely testing how HBase can sustain a constant > import of a lot of data. Regarding that, I'd be interested in knowing > exactly the circumstances of the region server failure. > > Regarding real life example, one of our cluster has about 2.5TB of > LZOed data (not sure about the raw size) according to dfs -du, on 20 > nodes (FWIW). When trying to reach high density on your nodes, be sure > to compress your data and set the split size bigger than the default > of 256MB or you'll end up with too many regions. > > J-D > > On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <[EMAIL PROTECTED]> > wrote: >> I did a testing with 6 regionserver cluster with a key design that spread >> the incoming data to all regions. >> I noticed after pumping data for 3-4 days for about 3 TB data, one of the >> regionserver shuts down because >> of channel IO error. on a 3 regionserver cluster and same key design, >> the >> regionservers shuts down after only >> 45G data insertion. >> >> I notice that if the key is designed so that it doesn't spread to all >> regions, but only to small portion of regions and that >> portion of regions spread approximately evenly among all regionservers, >> then >> the HDFS size becomes the limit of >> the total number of regions that can be supported and I don't run into >> this >> IO issue. >> >> Can any body show us the actual example of the hbase data size and >> cluster >> size ? >> >> Jimmy. >> >> -------------------------------------------------- >> From: "Jonathan Gray" <[EMAIL PROTECTED]> >> Sent: Friday, August 27, 2010 10:55 AM >> To: <[EMAIL PROTECTED]> >> Subject: RE: how many regions a regionserver can support >> >>> There is no fixed limit, it has much more to do with the read/write load >>> than the actual dataset size. >>> >>> HBase is usually fine having very densely packed RegionServers, if much >>> of >>> the data is rarely accessed. If you have extremely high numbers of >>> regions >>> per server and you are writing to all of these regions, or even reading >>> from >>> all of them, you could have issues. Though storage capacity needs to be >>> considered, capacity planning often has much more to do with how much >>> memory >>> you need to support the read/write load you expect. Reads mostly from a
-
RE: how many regions a regionserver can supportJonathan Gray 2010-09-01, 19:17
Again, the read/write load has much more to do with cluster sizing than the dataset (total capacity aside).
To give you an idea of how widely it varies, I had a client who put several hundred GBs of data onto a single node setup of HBase. I've also seen clusters of 20-100 nodes with only 10s of GBs on it (very high concurrent write load). Recently I've been playing with a 100 node cluster with about 20TB of data on it (before replication). Each of these clusters had very different load profiles. And node count is not the only important metric. That one node cluster was a pair of 2TB disks while these 100 node clusters are packed with 12 1TB disks per node. JG > -----Original Message----- > From: Jinsong Hu [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 01, 2010 11:22 AM > To: [EMAIL PROTECTED] > Subject: Re: how many regions a regionserver can support > > I did a testing with 6 regionserver cluster with a key design that > spread > the incoming data to all regions. > I noticed after pumping data for 3-4 days for about 3 TB data, one of > the > regionserver shuts down because > of channel IO error. on a 3 regionserver cluster and same key design, > the > regionservers shuts down after only > 45G data insertion. > > I notice that if the key is designed so that it doesn't spread to all > regions, but only to small portion of regions and that > portion of regions spread approximately evenly among all regionservers, > then > the HDFS size becomes the limit of > the total number of regions that can be supported and I don't run into > this > IO issue. > > Can any body show us the actual example of the hbase data size and > cluster > size ? > > Jimmy. > > -------------------------------------------------- > From: "Jonathan Gray" <[EMAIL PROTECTED]> > Sent: Friday, August 27, 2010 10:55 AM > To: <[EMAIL PROTECTED]> > Subject: RE: how many regions a regionserver can support > > > There is no fixed limit, it has much more to do with the read/write > load > > than the actual dataset size. > > > > HBase is usually fine having very densely packed RegionServers, if > much of > > the data is rarely accessed. If you have extremely high numbers of > > regions per server and you are writing to all of these regions, or > even > > reading from all of them, you could have issues. Though storage > capacity > > needs to be considered, capacity planning often has much more to do > with > > how much memory you need to support the read/write load you expect. > Reads > > mostly from a performance POV but for writes, there are some > important > > considerations related to the number of regions per server (and thus > data > > density and determining your max region size). > > > > In any case, you should probably increase your max size to 1GB or so > and > > can go higher if necessary. > > > > JG > > > >> -----Original Message----- > >> From: Jinsong Hu [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, August 27, 2010 10:03 AM > >> To: [EMAIL PROTECTED] > >> Subject: how many regions a regionserver can support > >> > >> Hi, There : > >> Does anybody know how many region a regionserver can support ? I > >> have > >> regionservers with 8G ram and 1.5T disk and 4 core CPU. > >> I searched http://www.facebook.com/note.php?note_id=142473677002 and > >> they > >> say google target is 100 regions of 200M for each > >> regionserver. > >> In my case, I have 2700 regions spread to 6 regionservers. each > >> region is > >> set to default size of 256M . and it seems it is still running fine. > I > >> am > >> running CDH3. I just wonder what is the upper limit so that I can > do > >> capacity planning. Does anybody know this ? > >> > >> Jimmy. > > > >
-
Re: how many regions a regionserver can supportScott Whitecross 2010-09-01, 19:56
"be sureto compress your data and set the split size bigger than the default
of 256MB or you'll end up with too many regions." How many regions are to many? I have a decent sized cluster (~30 nodes) and started inserting new data, and noticed that after a day, I went from 30 regions on each server to 60. That is using the default region size. I haven't tested increasing the region file sizes, as I'm concerned about performance scanning data. On Wed, Sep 1, 2010 at 2:35 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > Is that really a good test? Unless you are planning to write about 1TB > of new data per day into HBase I don't see how you are testing > capacity, you're more likely testing how HBase can sustain a constant > import of a lot of data. Regarding that, I'd be interested in knowing > exactly the circumstances of the region server failure. > > Regarding real life example, one of our cluster has about 2.5TB of > LZOed data (not sure about the raw size) according to dfs -du, on 20 > nodes (FWIW). When trying to reach high density on your nodes, be sure > to compress your data and set the split size bigger than the default > of 256MB or you'll end up with too many regions. > > J-D > > On Wed, Sep 1, 2010 at 11:21 AM, Jinsong Hu <[EMAIL PROTECTED]> > wrote: > > I did a testing with 6 regionserver cluster with a key design that spread > > the incoming data to all regions. > > I noticed after pumping data for 3-4 days for about 3 TB data, one of the > > regionserver shuts down because > > of channel IO error. on a 3 regionserver cluster and same key design, > the > > regionservers shuts down after only > > 45G data insertion. > > > > I notice that if the key is designed so that it doesn't spread to all > > regions, but only to small portion of regions and that > > portion of regions spread approximately evenly among all regionservers, > then > > the HDFS size becomes the limit of > > the total number of regions that can be supported and I don't run into > this > > IO issue. > > > > Can any body show us the actual example of the hbase data size and > cluster > > size ? > > > > Jimmy. > > > > -------------------------------------------------- > > From: "Jonathan Gray" <[EMAIL PROTECTED]> > > Sent: Friday, August 27, 2010 10:55 AM > > To: <[EMAIL PROTECTED]> > > Subject: RE: how many regions a regionserver can support > > > >> There is no fixed limit, it has much more to do with the read/write load > >> than the actual dataset size. > >> > >> HBase is usually fine having very densely packed RegionServers, if much > of > >> the data is rarely accessed. If you have extremely high numbers of > regions > >> per server and you are writing to all of these regions, or even reading > from > >> all of them, you could have issues. Though storage capacity needs to be > >> considered, capacity planning often has much more to do with how much > memory > >> you need to support the read/write load you expect. Reads mostly from a > >> performance POV but for writes, there are some important considerations > >> related to the number of regions per server (and thus data density and > >> determining your max region size). > >> > >> In any case, you should probably increase your max size to 1GB or so and > >> can go higher if necessary. > >> > >> JG > >> > >>> -----Original Message----- > >>> From: Jinsong Hu [mailto:[EMAIL PROTECTED]] > >>> Sent: Friday, August 27, 2010 10:03 AM > >>> To: [EMAIL PROTECTED] > >>> Subject: how many regions a regionserver can support > >>> > >>> Hi, There : > >>> Does anybody know how many region a regionserver can support ? I > >>> have > >>> regionservers with 8G ram and 1.5T disk and 4 core CPU. > >>> I searched http://www.facebook.com/note.php?note_id=142473677002 and > >>> they > >>> say google target is 100 regions of 200M for each > >>> regionserver. > >>> In my case, I have 2700 regions spread to 6 regionservers. each > >>> region is > >>> set to default size of 256M . and it seems it is still running fine. I |