|
Wayne
2010-12-17, 17:09
Jean-Daniel Cryans
2010-12-17, 19:46
Wayne
2010-12-17, 20:29
Jonathan Gray
2010-12-17, 21:30
Wayne
2010-12-17, 22:28
Jonathan Gray
2010-12-17, 22:46
Wayne
2010-12-20, 14:33
Stack
2010-12-20, 16:42
Wayne
2010-12-20, 17:12
Stack
2010-12-20, 17:18
M. C. Srivas
2011-02-18, 07:15
Ryan Rawson
2011-02-18, 07:32
Wayne
2011-02-18, 18:12
Jason Rutherglen
2011-02-18, 19:36
Jean-Daniel Cryans
2011-02-18, 19:43
Chris Tarnas
2011-02-18, 19:50
Jean-Daniel Cryans
2011-02-18, 19:59
Chris Tarnas
2011-02-18, 20:05
Ted Dunning
2011-02-18, 20:08
Jean-Daniel Cryans
2011-02-18, 20:10
Todd Lipcon
2011-02-18, 22:46
Wayne
2011-02-19, 14:43
Stack
2011-02-19, 19:48
Jean-Daniel Cryans
2011-02-19, 23:22
|
-
Cluster Size/Node DensityWayne 2010-12-17, 17:09
We would like some help with cluster sizing estimates. We have 15TB of
currently relational data we want to store in hbase. Once that is replicated to a factor of 3 and stored with secondary indexes etc. we assume will have 50TB+ of data. The data is basically data warehouse style time series data where much of it is cold, however want good read latency to get access to all of it. Not memory based latency but < 25ms latency for a small chunks of data. How many nodes, regions, etc. are we going to need? Assuming a typical 6 disk, 24GB ram, 16 core data node, how many of these do we need to sufficiently manage this volume of data? Obviously there are a million "it depends", but the bigger drivers are how much data can a node handle? How long will compaction take? How many regions can a node handle and how big can those regions get? Can we really have 1.5TB of data on a single node in 6,000 regions? What are the true drivers between more nodes vs. bigger nodes? Do we need 30 nodes to handle our 50GB of data or 100 nodes? What will our read latency be for 30 vs. 100? Sure we can pack 20 nodes with 3TB of data each but will it take 1+s for every get? Will compaction run for 3 days? How much data is really "too much" on an hbase data node? Any help or advice would be greatly appreciated. Thanks Wayne
-
Re: Cluster Size/Node DensityJean-Daniel Cryans 2010-12-17, 19:46
Hi Wayne,
This question has such a large scope but is applicable to such a tiny subset of workloads (eg yours) that fielding all the questions in details would probably end up just wasting everyone's cycles. So first I'd like to clear up some confusion. > We would like some help with cluster sizing estimates. We have 15TB of > currently relational data we want to store in hbase. There's the 3x replication factor, but also you have to account that each value is stored with it's row key, family name, qualifier and timestamp. That could be a lot more data to store, but at the same time you can use LZO compression to bring that down ~4x. > How many nodes, regions, etc. are we going to need? You don't really have the control over regions, they are created for you as your data grows. > What will our read latency be for 30 vs. 100? Sure we can pack 20 nodes with 3TB > of data each but will it take 1+s for every get? I'm not sure what kind of back-of-the-envelope calculations took you to 1sec, but latency will be strictly determined by concurrency and actual machine load. Even if you were able to pack 20TB in one onde but using a tiny portion of it, you would still get sub 100ms latencies. Or if you have only 10GB on that node but it's getting hammered by 10000 clients, then you should expect much higher latencies. > Will compaction run for 3 days? Which compactions? Major ones? If you don't insert new data in a region, it won't be major compacted. Also if you have that much data, I would set the time between major compactions to be bigger than 1 day. Heck, since you are doing time series, this means you'll never delete anything right? So you might as well disable them. And now for the meaty part... The size of your dataset is only one part of the equation, the other being traffic you would be pushing to the cluster which I think wasn't covered at all in your email. Like I said previously, you can pack a lot of data in a single node and can retrieve it really fast as long as concurrency is low. Another thing is how random your reading pattern is... can you even leverage the block cache at all? If yes, then you can accept more concurrency, if not then hitting HDFS is a lot slower (and it's still not very good at handling many clients). Unfortunately, even if you gave us exactly how many QPS you want to do per second, we'd have a hard time recommending any number of nodes. What I would recommend then is to benchmark it. Try to grab 5-6 machines, load a subset of the data, generate traffic, see how it behaves. Hope that helps, J-D On Fri, Dec 17, 2010 at 9:09 AM, Wayne <[EMAIL PROTECTED]> wrote: > We would like some help with cluster sizing estimates. We have 15TB of > currently relational data we want to store in hbase. Once that is replicated > to a factor of 3 and stored with secondary indexes etc. we assume will have > 50TB+ of data. The data is basically data warehouse style time series data > where much of it is cold, however want good read latency to get access to > all of it. Not memory based latency but < 25ms latency for a small chunks of > data. > > How many nodes, regions, etc. are we going to need? Assuming a typical 6 > disk, 24GB ram, 16 core data node, how many of these do we need to > sufficiently manage this volume of data? Obviously there are a million "it > depends", but the bigger drivers are how much data can a node handle? How > long will compaction take? How many regions can a node handle and how big > can those regions get? Can we really have 1.5TB of data on a single node in > 6,000 regions? What are the true drivers between more nodes vs. bigger > nodes? Do we need 30 nodes to handle our 50GB of data or 100 nodes? What > will our read latency be for 30 vs. 100? Sure we can pack 20 nodes with 3TB > of data each but will it take 1+s for every get? Will compaction run for 3 > days? How much data is really "too much" on an hbase data node? > > Any help or advice would be greatly appreciated. > > Thanks
-
Re: Cluster Size/Node DensityWayne 2010-12-17, 20:29
Sorry, I am sure my questions were far too broad to answer.
Let me *try* to ask more specific questions. Assuming all data requests are cold (random reading pattern) and everything comes from the disks (no block cache), what level of concurrency can HDFS handle? Almost all of the load is controlled data processing, but we have to do a lot of work at night during the batch window so something in the 15-20,000 QPS range would meet current worse case requirements. How many nodes would be required to effectively return data against a 50TB aggregate data store? Disk I/O assumedly starts to break down at a certain point in terms of concurrent readers/node/disk. We have in our control how many total concurrent readers there are, so if we can get 10ms response time with 100 readers that might be better than 100ms responses from 1000 concurrent readers. Thanks. On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > Hi Wayne, > > This question has such a large scope but is applicable to such a tiny > subset of workloads (eg yours) that fielding all the questions in > details would probably end up just wasting everyone's cycles. So first > I'd like to clear up some confusion. > > > We would like some help with cluster sizing estimates. We have 15TB of > > currently relational data we want to store in hbase. > > There's the 3x replication factor, but also you have to account that > each value is stored with it's row key, family name, qualifier and > timestamp. That could be a lot more data to store, but at the same > time you can use LZO compression to bring that down ~4x. > > > How many nodes, regions, etc. are we going to need? > > You don't really have the control over regions, they are created for > you as your data grows. > > > What will our read latency be for 30 vs. 100? Sure we can pack 20 nodes > with 3TB > > of data each but will it take 1+s for every get? > > I'm not sure what kind of back-of-the-envelope calculations took you > to 1sec, but latency will be strictly determined by concurrency and > actual machine load. Even if you were able to pack 20TB in one onde > but using a tiny portion of it, you would still get sub 100ms > latencies. Or if you have only 10GB on that node but it's getting > hammered by 10000 clients, then you should expect much higher > latencies. > > > Will compaction run for 3 days? > > Which compactions? Major ones? If you don't insert new data in a > region, it won't be major compacted. Also if you have that much data, > I would set the time between major compactions to be bigger than 1 > day. Heck, since you are doing time series, this means you'll never > delete anything right? So you might as well disable them. > > And now for the meaty part... > > The size of your dataset is only one part of the equation, the other > being traffic you would be pushing to the cluster which I think wasn't > covered at all in your email. Like I said previously, you can pack a > lot of data in a single node and can retrieve it really fast as long > as concurrency is low. Another thing is how random your reading > pattern is... can you even leverage the block cache at all? If yes, > then you can accept more concurrency, if not then hitting HDFS is a > lot slower (and it's still not very good at handling many clients). > > Unfortunately, even if you gave us exactly how many QPS you want to do > per second, we'd have a hard time recommending any number of nodes. > > What I would recommend then is to benchmark it. Try to grab 5-6 > machines, load a subset of the data, generate traffic, see how it > behaves. > > Hope that helps, > > J-D > > On Fri, Dec 17, 2010 at 9:09 AM, Wayne <[EMAIL PROTECTED]> wrote: > > We would like some help with cluster sizing estimates. We have 15TB of > > currently relational data we want to store in hbase. Once that is > replicated > > to a factor of 3 and stored with secondary indexes etc. we assume will > have > > 50TB+ of data. The data is basically data warehouse style time series
-
RE: Cluster Size/Node DensityJonathan Gray 2010-12-17, 21:30
You absolutely need to do some testing and benchmarking.
This sounds like the kind of application that will require lots of tuning to get right. It also sounds like the kind of thing HDFS is typically not very good at. There is an increasing amount of activity in this area (optimizing HDFS for random reads) and lots of good ideas. HDFS-347 would probably help tremendously for this kind of high random read rate, bypassing the DN completely. JG > -----Original Message----- > From: Wayne [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 12:29 PM > To: [EMAIL PROTECTED] > Subject: Re: Cluster Size/Node Density > > Sorry, I am sure my questions were far too broad to answer. > > Let me *try* to ask more specific questions. Assuming all data requests are > cold (random reading pattern) and everything comes from the disks (no > block cache), what level of concurrency can HDFS handle? Almost all of the > load is controlled data processing, but we have to do a lot of work at night > during the batch window so something in the 15-20,000 QPS range would > meet current worse case requirements. How many nodes would be required > to effectively return data against a 50TB aggregate data store? Disk I/O > assumedly starts to break down at a certain point in terms of concurrent > readers/node/disk. > We have in our control how many total concurrent readers there are, so if we > can get 10ms response time with 100 readers that might be better than > 100ms responses from 1000 concurrent readers. > > Thanks. > > > On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans > <[EMAIL PROTECTED]>wrote: > > > Hi Wayne, > > > > This question has such a large scope but is applicable to such a tiny > > subset of workloads (eg yours) that fielding all the questions in > > details would probably end up just wasting everyone's cycles. So first > > I'd like to clear up some confusion. > > > > > We would like some help with cluster sizing estimates. We have 15TB > > > of currently relational data we want to store in hbase. > > > > There's the 3x replication factor, but also you have to account that > > each value is stored with it's row key, family name, qualifier and > > timestamp. That could be a lot more data to store, but at the same > > time you can use LZO compression to bring that down ~4x. > > > > > How many nodes, regions, etc. are we going to need? > > > > You don't really have the control over regions, they are created for > > you as your data grows. > > > > > What will our read latency be for 30 vs. 100? Sure we can pack 20 > > > nodes > > with 3TB > > > of data each but will it take 1+s for every get? > > > > I'm not sure what kind of back-of-the-envelope calculations took you > > to 1sec, but latency will be strictly determined by concurrency and > > actual machine load. Even if you were able to pack 20TB in one onde > > but using a tiny portion of it, you would still get sub 100ms > > latencies. Or if you have only 10GB on that node but it's getting > > hammered by 10000 clients, then you should expect much higher > > latencies. > > > > > Will compaction run for 3 days? > > > > Which compactions? Major ones? If you don't insert new data in a > > region, it won't be major compacted. Also if you have that much data, > > I would set the time between major compactions to be bigger than 1 > > day. Heck, since you are doing time series, this means you'll never > > delete anything right? So you might as well disable them. > > > > And now for the meaty part... > > > > The size of your dataset is only one part of the equation, the other > > being traffic you would be pushing to the cluster which I think wasn't > > covered at all in your email. Like I said previously, you can pack a > > lot of data in a single node and can retrieve it really fast as long > > as concurrency is low. Another thing is how random your reading > > pattern is... can you even leverage the block cache at all? If yes, > > then you can accept more concurrency, if not then hitting HDFS is a
-
Re: Cluster Size/Node DensityWayne 2010-12-17, 22:28
What can we expect from HDFS in terms of random reads? It is our own load,
so we can "shape" it to a degree to be more "optimized" to how Hbase/hdfs prefers to function. We have a 10 node cluster we have been testing another nosql solution on, and we can try to test with that but I guess I am trying to do a gut check on what we are trying to do before moving to a different nosql solution (and wasting more r&d time). Concurrent reads and degrading read latency from disk i/o based reads as data volumes increase (total data stored) on the node is the wall we have hit with the other nosql solution. We totally understand the limitations of disks and disk i/o, that has always been the enemy of large databases. SSDs and Memory are currently too expensive to solve our problem. We want our limit to be what the physical disks can handle, and everything else to be a thin layer on top. We are looking for a solution that we know what each node can perform in terms of concurrent read/write load, and we then decide on the number of nodes based on required Gets/Puts per second. Can we store 15GB of data (before replication - 45GB+ after) on 30 nodes, and sustain 120 disk based readers returning data consistently in under 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this specific scenario in the realm of possible making all kinds of assumptions? If 25ms too fast is 50ms more likely? Is 100ms more likely? If we assume 100ms can it handle 240 readers at that rate? Concurrency will go down once the disk utilization is saturated and latency fundamentally is based on random disk io latency, but we are looking for what hbase can handle. I am sorry for such general questions, but I am trying to do a gut check before diving into a long testing scenario. Thanks. On Fri, Dec 17, 2010 at 4:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > You absolutely need to do some testing and benchmarking. > > This sounds like the kind of application that will require lots of tuning > to get right. It also sounds like the kind of thing HDFS is typically not > very good at. > > There is an increasing amount of activity in this area (optimizing HDFS for > random reads) and lots of good ideas. HDFS-347 would probably help > tremendously for this kind of high random read rate, bypassing the DN > completely. > > JG > > > -----Original Message----- > > From: Wayne [mailto:[EMAIL PROTECTED]] > > Sent: Friday, December 17, 2010 12:29 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Cluster Size/Node Density > > > > Sorry, I am sure my questions were far too broad to answer. > > > > Let me *try* to ask more specific questions. Assuming all data requests > are > > cold (random reading pattern) and everything comes from the disks (no > > block cache), what level of concurrency can HDFS handle? Almost all of > the > > load is controlled data processing, but we have to do a lot of work at > night > > during the batch window so something in the 15-20,000 QPS range would > > meet current worse case requirements. How many nodes would be required > > to effectively return data against a 50TB aggregate data store? Disk I/O > > assumedly starts to break down at a certain point in terms of concurrent > > readers/node/disk. > > We have in our control how many total concurrent readers there are, so if > we > > can get 10ms response time with 100 readers that might be better than > > 100ms responses from 1000 concurrent readers. > > > > Thanks. > > > > > > On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans > > <[EMAIL PROTECTED]>wrote: > > > > > Hi Wayne, > > > > > > This question has such a large scope but is applicable to such a tiny > > > subset of workloads (eg yours) that fielding all the questions in > > > details would probably end up just wasting everyone's cycles. So first > > > I'd like to clear up some confusion. > > > > > > > We would like some help with cluster sizing estimates. We have 15TB > > > > of currently relational data we want to store in hbase. >
-
RE: Cluster Size/Node DensityJonathan Gray 2010-12-17, 22:46
You meant 15TB/45TB right?
Your numbers seem in the realm of possibility. You should try it out on your 10 node cluster if you can. I've done applications like this in the past with a large dataset and just random reads and HBase has performed well. I also took advantage of HFileOutputFormat to write the data quickly. But it was not 5000qps, this app was only in the 100s. Ensure that your reads are Get operations with HBase as those will use HDFS pread instead of seek/read. For this application, you absolutely must be using pread. Good luck. I'm interested in seeing how you can get HBase to perform, we are here to help if you have any issues. JG > -----Original Message----- > From: Wayne [mailto:[EMAIL PROTECTED]] > Sent: Friday, December 17, 2010 2:28 PM > To: [EMAIL PROTECTED] > Subject: Re: Cluster Size/Node Density > > What can we expect from HDFS in terms of random reads? It is our own load, > so we can "shape" it to a degree to be more "optimized" to how Hbase/hdfs > prefers to function. We have a 10 node cluster we have been testing another > nosql solution on, and we can try to test with that but I guess I am trying to > do a gut check on what we are trying to do before moving to a different > nosql solution (and wasting more r&d time). Concurrent reads and degrading > read latency from disk i/o based reads as data volumes increase (total data > stored) on the node is the wall we have hit with the other nosql solution. > We totally understand the limitations of disks and disk i/o, that has always > been the enemy of large databases. SSDs and Memory are currently too > expensive to solve our problem. We want our limit to be what the physical > disks can handle, and everything else to be a thin layer on top. We are > looking for a solution that we know what each node can perform in terms of > concurrent read/write load, and we then decide on the number of nodes > based on required Gets/Puts per second. > > Can we store 15GB of data (before replication - 45GB+ after) on 30 nodes, > and sustain 120 disk based readers returning data consistently in under > 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this specific > scenario in the realm of possible making all kinds of assumptions? If 25ms too > fast is 50ms more likely? Is 100ms more likely? If we assume 100ms can it > handle 240 readers at that rate? Concurrency will go down once the disk > utilization is saturated and latency fundamentally is based on random disk io > latency, but we are looking for what hbase can handle. > > I am sorry for such general questions, but I am trying to do a gut check before > diving into a long testing scenario. > > Thanks. > > > On Fri, Dec 17, 2010 at 4:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > You absolutely need to do some testing and benchmarking. > > > > This sounds like the kind of application that will require lots of > > tuning to get right. It also sounds like the kind of thing HDFS is > > typically not very good at. > > > > There is an increasing amount of activity in this area (optimizing > > HDFS for random reads) and lots of good ideas. HDFS-347 would > > probably help tremendously for this kind of high random read rate, > > bypassing the DN completely. > > > > JG > > > > > -----Original Message----- > > > From: Wayne [mailto:[EMAIL PROTECTED]] > > > Sent: Friday, December 17, 2010 12:29 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: Cluster Size/Node Density > > > > > > Sorry, I am sure my questions were far too broad to answer. > > > > > > Let me *try* to ask more specific questions. Assuming all data > > > requests > > are > > > cold (random reading pattern) and everything comes from the disks > > > (no block cache), what level of concurrency can HDFS handle? Almost > > > all of > > the > > > load is controlled data processing, but we have to do a lot of work > > > at > > night > > > during the batch window so something in the 15-20,000 QPS range > > > would meet current worse case requirements. How many nodes would
-
Re: Cluster Size/Node DensityWayne 2010-12-20, 14:33
Yes I meant 15TB/45TB
The pread I assume translates into a get/getRow vs. opening a scanner? For reads we are going to have to go through thrift from python, does that raise more concerns? We assume we will have to use java/jython for writes based on what have seen in terms of published performance benchmarks of thrift vs. java but for reads we have to use python. Thanks. On Fri, Dec 17, 2010 at 5:46 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > You meant 15TB/45TB right? > > Your numbers seem in the realm of possibility. You should try it out on > your 10 node cluster if you can. I've done applications like this in the > past with a large dataset and just random reads and HBase has performed > well. I also took advantage of HFileOutputFormat to write the data quickly. > But it was not 5000qps, this app was only in the 100s. > > Ensure that your reads are Get operations with HBase as those will use HDFS > pread instead of seek/read. For this application, you absolutely must be > using pread. > > Good luck. I'm interested in seeing how you can get HBase to perform, we > are here to help if you have any issues. > > JG > > > -----Original Message----- > > From: Wayne [mailto:[EMAIL PROTECTED]] > > Sent: Friday, December 17, 2010 2:28 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Cluster Size/Node Density > > > > What can we expect from HDFS in terms of random reads? It is our own > load, > > so we can "shape" it to a degree to be more "optimized" to how Hbase/hdfs > > prefers to function. We have a 10 node cluster we have been testing > another > > nosql solution on, and we can try to test with that but I guess I am > trying to > > do a gut check on what we are trying to do before moving to a different > > nosql solution (and wasting more r&d time). Concurrent reads and > degrading > > read latency from disk i/o based reads as data volumes increase (total > data > > stored) on the node is the wall we have hit with the other nosql > solution. > > We totally understand the limitations of disks and disk i/o, that has > always > > been the enemy of large databases. SSDs and Memory are currently too > > expensive to solve our problem. We want our limit to be what the physical > > disks can handle, and everything else to be a thin layer on top. We are > > looking for a solution that we know what each node can perform in terms > of > > concurrent read/write load, and we then decide on the number of nodes > > based on required Gets/Puts per second. > > > > Can we store 15GB of data (before replication - 45GB+ after) on 30 nodes, > > and sustain 120 disk based readers returning data consistently in under > > 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this specific > > scenario in the realm of possible making all kinds of assumptions? If > 25ms too > > fast is 50ms more likely? Is 100ms more likely? If we assume 100ms can it > > handle 240 readers at that rate? Concurrency will go down once the disk > > utilization is saturated and latency fundamentally is based on random > disk io > > latency, but we are looking for what hbase can handle. > > > > I am sorry for such general questions, but I am trying to do a gut check > before > > diving into a long testing scenario. > > > > Thanks. > > > > > > On Fri, Dec 17, 2010 at 4:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > > > You absolutely need to do some testing and benchmarking. > > > > > > This sounds like the kind of application that will require lots of > > > tuning to get right. It also sounds like the kind of thing HDFS is > > > typically not very good at. > > > > > > There is an increasing amount of activity in this area (optimizing > > > HDFS for random reads) and lots of good ideas. HDFS-347 would > > > probably help tremendously for this kind of high random read rate, > > > bypassing the DN completely. > > > > > > JG > > > > > > > -----Original Message----- > > > > From: Wayne [mailto:[EMAIL PROTECTED]] > > > > Sent: Friday, December 17, 2010 12:29 PM > >
-
Re: Cluster Size/Node DensityStack 2010-12-20, 16:42
On Mon, Dec 20, 2010 at 6:33 AM, Wayne <[EMAIL PROTECTED]> wrote:
> Yes I meant 15TB/45TB > > The pread I assume translates into a get/getRow vs. opening a scanner? For > reads we are going to have to go through thrift from python, does that raise > more concerns? No. Should be fine. We assume we will have to use java/jython for writes based on > what have seen in terms of published performance benchmarks of thrift vs. > java but for reads we have to use python. > You should start out writing from python via thrift. See how it goes before going to java or java via jython. You say above that its your data. Can you shape it so accesses are Scans rather than random reads? St.Ack > Thanks. > > On Fri, Dec 17, 2010 at 5:46 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > >> You meant 15TB/45TB right? >> >> Your numbers seem in the realm of possibility. You should try it out on >> your 10 node cluster if you can. I've done applications like this in the >> past with a large dataset and just random reads and HBase has performed >> well. I also took advantage of HFileOutputFormat to write the data quickly. >> But it was not 5000qps, this app was only in the 100s. >> >> Ensure that your reads are Get operations with HBase as those will use HDFS >> pread instead of seek/read. For this application, you absolutely must be >> using pread. >> >> Good luck. I'm interested in seeing how you can get HBase to perform, we >> are here to help if you have any issues. >> >> JG >> >> > -----Original Message----- >> > From: Wayne [mailto:[EMAIL PROTECTED]] >> > Sent: Friday, December 17, 2010 2:28 PM >> > To: [EMAIL PROTECTED] >> > Subject: Re: Cluster Size/Node Density >> > >> > What can we expect from HDFS in terms of random reads? It is our own >> load, >> > so we can "shape" it to a degree to be more "optimized" to how Hbase/hdfs >> > prefers to function. We have a 10 node cluster we have been testing >> another >> > nosql solution on, and we can try to test with that but I guess I am >> trying to >> > do a gut check on what we are trying to do before moving to a different >> > nosql solution (and wasting more r&d time). Concurrent reads and >> degrading >> > read latency from disk i/o based reads as data volumes increase (total >> data >> > stored) on the node is the wall we have hit with the other nosql >> solution. >> > We totally understand the limitations of disks and disk i/o, that has >> always >> > been the enemy of large databases. SSDs and Memory are currently too >> > expensive to solve our problem. We want our limit to be what the physical >> > disks can handle, and everything else to be a thin layer on top. We are >> > looking for a solution that we know what each node can perform in terms >> of >> > concurrent read/write load, and we then decide on the number of nodes >> > based on required Gets/Puts per second. >> > >> > Can we store 15GB of data (before replication - 45GB+ after) on 30 nodes, >> > and sustain 120 disk based readers returning data consistently in under >> > 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this specific >> > scenario in the realm of possible making all kinds of assumptions? If >> 25ms too >> > fast is 50ms more likely? Is 100ms more likely? If we assume 100ms can it >> > handle 240 readers at that rate? Concurrency will go down once the disk >> > utilization is saturated and latency fundamentally is based on random >> disk io >> > latency, but we are looking for what hbase can handle. >> > >> > I am sorry for such general questions, but I am trying to do a gut check >> before >> > diving into a long testing scenario. >> > >> > Thanks. >> > >> > >> > On Fri, Dec 17, 2010 at 4:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> > >> > > You absolutely need to do some testing and benchmarking. >> > > >> > > This sounds like the kind of application that will require lots of >> > > tuning to get right. It also sounds like the kind of thing HDFS is >> > > typically not very good at. >> > >
-
Re: Cluster Size/Node DensityWayne 2010-12-20, 17:12
Can we control the WAL and write buffer size via thrift? We assume we have
to use java for writes to get access to the settings below which we assume we need to get extremely fast writes. We are looking for something in the range of 100k writes/sec for the cluster as a whole. p.setWriteToWAL(false); hTable.setAutoFlush(false); hTable.setWriteBufferSize(1024*1024*12); In terms of reshaping our reads to be scans, I do not see how we can do that at this point. Are you suggesting that we move to a map/reduce pattern to crawl through the data? Thanks. On Mon, Dec 20, 2010 at 11:42 AM, Stack <[EMAIL PROTECTED]> wrote: > On Mon, Dec 20, 2010 at 6:33 AM, Wayne <[EMAIL PROTECTED]> wrote: > > Yes I meant 15TB/45TB > > > > The pread I assume translates into a get/getRow vs. opening a scanner? > For > > reads we are going to have to go through thrift from python, does that > raise > > more concerns? > > No. Should be fine. > > > We assume we will have to use java/jython for writes based on > > what have seen in terms of published performance benchmarks of thrift vs. > > java but for reads we have to use python. > > > > You should start out writing from python via thrift. See how it goes > before going to java or java via jython. > > You say above that its your data. Can you shape it so accesses are > Scans rather than random reads? > > St.Ack > > > > Thanks. > > > > On Fri, Dec 17, 2010 at 5:46 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > >> You meant 15TB/45TB right? > >> > >> Your numbers seem in the realm of possibility. You should try it out on > >> your 10 node cluster if you can. I've done applications like this in > the > >> past with a large dataset and just random reads and HBase has performed > >> well. I also took advantage of HFileOutputFormat to write the data > quickly. > >> But it was not 5000qps, this app was only in the 100s. > >> > >> Ensure that your reads are Get operations with HBase as those will use > HDFS > >> pread instead of seek/read. For this application, you absolutely must > be > >> using pread. > >> > >> Good luck. I'm interested in seeing how you can get HBase to perform, > we > >> are here to help if you have any issues. > >> > >> JG > >> > >> > -----Original Message----- > >> > From: Wayne [mailto:[EMAIL PROTECTED]] > >> > Sent: Friday, December 17, 2010 2:28 PM > >> > To: [EMAIL PROTECTED] > >> > Subject: Re: Cluster Size/Node Density > >> > > >> > What can we expect from HDFS in terms of random reads? It is our own > >> load, > >> > so we can "shape" it to a degree to be more "optimized" to how > Hbase/hdfs > >> > prefers to function. We have a 10 node cluster we have been testing > >> another > >> > nosql solution on, and we can try to test with that but I guess I am > >> trying to > >> > do a gut check on what we are trying to do before moving to a > different > >> > nosql solution (and wasting more r&d time). Concurrent reads and > >> degrading > >> > read latency from disk i/o based reads as data volumes increase (total > >> data > >> > stored) on the node is the wall we have hit with the other nosql > >> solution. > >> > We totally understand the limitations of disks and disk i/o, that has > >> always > >> > been the enemy of large databases. SSDs and Memory are currently too > >> > expensive to solve our problem. We want our limit to be what the > physical > >> > disks can handle, and everything else to be a thin layer on top. We > are > >> > looking for a solution that we know what each node can perform in > terms > >> of > >> > concurrent read/write load, and we then decide on the number of nodes > >> > based on required Gets/Puts per second. > >> > > >> > Can we store 15GB of data (before replication - 45GB+ after) on 30 > nodes, > >> > and sustain 120 disk based readers returning data consistently in > under > >> > 25ms? That is 40 reads/sec/thread or around 5,000 qps. Is this > specific > >> > scenario in the realm of possible making all kinds of assumptions? If > >> 25ms too
-
Re: Cluster Size/Node DensityStack 2010-12-20, 17:18
On Mon, Dec 20, 2010 at 9:12 AM, Wayne <[EMAIL PROTECTED]> wrote:
> Can we control the WAL and write buffer size via thrift? We assume we have > to use java for writes to get access to the settings below which we assume > we need to get extremely fast writes. We are looking for something in the > range of 100k writes/sec for the cluster as a whole. > > p.setWriteToWAL(false); > hTable.setAutoFlush(false); > hTable.setWriteBufferSize(1024*1024*12); > For fast upload, use MapReduce and write the hbase files directly bypassing the API: http://people.apache.org/~stack/hbase-0.90.0-candidate-1/docs/bulk-loads.html Otherwise, yes, thrift API does not give you access to the above (You might be able to set a few of them via configuration IIRC). > > In terms of reshaping our reads to be scans, I do not see how we can do that > at this point. Are you suggesting that we move to a map/reduce pattern to > crawl through the data? > I'm just suggesting that if you can somehow Scan rather than random read, then your QPS wil be at least an order of magnitude higher. St.Ack
-
Re: Cluster Size/Node DensityM. C. Srivas 2011-02-18, 07:15
I was reading this thread with interest. Here's my $.02
On Fri, Dec 17, 2010 at 12:29 PM, Wayne <[EMAIL PROTECTED]> wrote: > Sorry, I am sure my questions were far too broad to answer. > > Let me *try* to ask more specific questions. Assuming all data requests are > cold (random reading pattern) and everything comes from the disks (no block > cache), what level of concurrency can HDFS handle? Cold cache, random reads ==> totally governed by seeks, so governed by # of spindles per box. A SATA drive can do about 100 random seeks per sec, ie, 100 reads/second > Almost all of the load is > controlled data processing, but we have to do a lot of work at night during > the batch window so something in the 15-20,000 QPS range would meet current > worse case requirements. How many nodes would be required to effectively > return data against a 50TB aggregate data store? Assuming 12 drives per node, and a cache hit rate of 0% (since its most cold cache), you will see about 12 * 100 = 1200 reads per second per node. If your cache hit rate goes up to 25%, then, your read rate is 1600 reads/sec/node Thus, 10 machines can serve about 12k-16k reads/sec, cold cache. 50TB of data on 10 machine => 5 TB/node. That might be bit too much for each region server (a RS can do about 700 regions comfortably, each of 1G). If you push, you might get 2TB/regionserver, or, 25 machines for total. If the data compresses 50%, then 12-13 nodes. So, for your scenario, its about 12-13 RS's, with 12 drives each, and you will comfortably do 24k QPS cold cache. Does that help? > Disk I/O assumedly starts > to break down at a certain point in terms of concurrent readers/node/disk. > We have in our control how many total concurrent readers there are, so if > we > can get 10ms response time with 100 readers that might be better than 100ms > responses from 1000 concurrent readers. > > Thanks. > > > On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] > >wrote: > > > Hi Wayne, > > > > This question has such a large scope but is applicable to such a tiny > > subset of workloads (eg yours) that fielding all the questions in > > details would probably end up just wasting everyone's cycles. So first > > I'd like to clear up some confusion. > > > > > We would like some help with cluster sizing estimates. We have 15TB of > > > currently relational data we want to store in hbase. > > > > There's the 3x replication factor, but also you have to account that > > each value is stored with it's row key, family name, qualifier and > > timestamp. That could be a lot more data to store, but at the same > > time you can use LZO compression to bring that down ~4x. > > > > > How many nodes, regions, etc. are we going to need? > > > > You don't really have the control over regions, they are created for > > you as your data grows. > > > > > What will our read latency be for 30 vs. 100? Sure we can pack 20 nodes > > with 3TB > > > of data each but will it take 1+s for every get? > > > > I'm not sure what kind of back-of-the-envelope calculations took you > > to 1sec, but latency will be strictly determined by concurrency and > > actual machine load. Even if you were able to pack 20TB in one onde > > but using a tiny portion of it, you would still get sub 100ms > > latencies. Or if you have only 10GB on that node but it's getting > > hammered by 10000 clients, then you should expect much higher > > latencies. > > > > > Will compaction run for 3 days? > > > > Which compactions? Major ones? If you don't insert new data in a > > region, it won't be major compacted. Also if you have that much data, > > I would set the time between major compactions to be bigger than 1 > > day. Heck, since you are doing time series, this means you'll never > > delete anything right? So you might as well disable them. > > > > And now for the meaty part... > > > > The size of your dataset is only one part of the equation, the other > > being traffic you would be pushing to the cluster which I think wasn't
-
Re: Cluster Size/Node DensityRyan Rawson 2011-02-18, 07:32
And dont forget that reading that data from the RS does not use
compression, so you are limited to about 120 MB/sec of read bandwidth per node, minus bandwidth used for HDFS replication and other incidentals. gige is just too damn slow. I look forward to 10g, perhaps we'll start seeing DC buildouts next year of 10g. -ryan On Thu, Feb 17, 2011 at 11:15 PM, M. C. Srivas <[EMAIL PROTECTED]> wrote: > I was reading this thread with interest. Here's my $.02 > > On Fri, Dec 17, 2010 at 12:29 PM, Wayne <[EMAIL PROTECTED]> wrote: > >> Sorry, I am sure my questions were far too broad to answer. >> >> Let me *try* to ask more specific questions. Assuming all data requests are >> cold (random reading pattern) and everything comes from the disks (no block >> cache), what level of concurrency can HDFS handle? > > > Cold cache, random reads ==> totally governed by seeks, so governed by # of > spindles per box. > > A SATA drive can do about 100 random seeks per sec, ie, 100 reads/second > > > >> Almost all of the load is >> controlled data processing, but we have to do a lot of work at night during >> the batch window so something in the 15-20,000 QPS range would meet current >> worse case requirements. How many nodes would be required to effectively >> return data against a 50TB aggregate data store? > > > Assuming 12 drives per node, and a cache hit rate of 0% (since its most cold > cache), you will see about 12 * 100 = 1200 reads per second per node. > If your cache hit rate goes up to 25%, then, your read rate is 1600 > reads/sec/node > Thus, 10 machines can serve about 12k-16k reads/sec, cold cache. > > 50TB of data on 10 machine => 5 TB/node. That might be bit too much for each > region server (a RS can do about 700 regions comfortably, each of 1G). If > you push, you might get 2TB/regionserver, or, 25 machines for total. If the > data compresses 50%, then 12-13 nodes. > > So, for your scenario, its about 12-13 RS's, with 12 drives each, and you > will comfortably do 24k QPS cold cache. > > Does that help? > > > >> Disk I/O assumedly starts >> to break down at a certain point in terms of concurrent readers/node/disk. >> We have in our control how many total concurrent readers there are, so if >> we >> can get 10ms response time with 100 readers that might be better than 100ms >> responses from 1000 concurrent readers. >> >> Thanks. >> >> >> On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] >> >wrote: >> >> > Hi Wayne, >> > >> > This question has such a large scope but is applicable to such a tiny >> > subset of workloads (eg yours) that fielding all the questions in >> > details would probably end up just wasting everyone's cycles. So first >> > I'd like to clear up some confusion. >> > >> > > We would like some help with cluster sizing estimates. We have 15TB of >> > > currently relational data we want to store in hbase. >> > >> > There's the 3x replication factor, but also you have to account that >> > each value is stored with it's row key, family name, qualifier and >> > timestamp. That could be a lot more data to store, but at the same >> > time you can use LZO compression to bring that down ~4x. >> > >> > > How many nodes, regions, etc. are we going to need? >> > >> > You don't really have the control over regions, they are created for >> > you as your data grows. >> > >> > > What will our read latency be for 30 vs. 100? Sure we can pack 20 nodes >> > with 3TB >> > > of data each but will it take 1+s for every get? >> > >> > I'm not sure what kind of back-of-the-envelope calculations took you >> > to 1sec, but latency will be strictly determined by concurrency and >> > actual machine load. Even if you were able to pack 20TB in one onde >> > but using a tiny portion of it, you would still get sub 100ms >> > latencies. Or if you have only 10GB on that node but it's getting >> > hammered by 10000 clients, then you should expect much higher >> > latencies. >> > >> > > Will compaction run for 3 days? >> > >> > Which compactions? Major ones? If you don't insert new data in a
-
Re: Cluster Size/Node DensityWayne 2011-02-18, 18:12
We have managed to get a little more than 1k QPS to date with 10 nodes.
Honestly we are not quite convinced that disk i/o seeks are our biggest bottleneck. Of course they should be...but waiting for RPC connections, network latency, thrift etc. all play into the time to get reads. The std dev. of read time is way too high for our comfort, but os cache and other things make it hard to benchmark this. Our average read time has jumped to ~65ms which is a lot more than we expected (we expected 30-40ms). Our reads are not as small as originally thought, but the 65ms still seems high. We would love to have added a scanOpenWithStop single rpc read call (open, get all rows, close scanner in one rpc call). We are moving to 10k disks (5 data disks per node) so once we get some of these nodes in place we can see how things compare. I suspect things won't change much...which will confirm disk i/o is not our only bottleneck. I will be happy to be wrong... Based on the performance we have seen we expect to need 40 nodes. With some fine tuning the 40 nodes can deliver 5k QPS which is what we need to run our application processes. We have a substantial write load as well, so our planning and numbers allow spare cycles for dealing with significant writes at the same time. We have cache set way low to just be used by .META. and all our tables have cache turned off. Memcached will be sitting on top to help client access. We are also using a 5Gb region size to keep our region counts in the 100-200 range/node per Jonathan Grey's recommendation. We have thought about going to infiniband or 10g, but with our smaller node sizes we don't think it will make much difference. The cost of infiniband for all 40 nodes buys us another 6 nodes...money better spent on scaling out. On Fri, Feb 18, 2011 at 2:15 AM, M. C. Srivas <[EMAIL PROTECTED]> wrote: > I was reading this thread with interest. Here's my $.02 > > On Fri, Dec 17, 2010 at 12:29 PM, Wayne <[EMAIL PROTECTED]> wrote: > >> Sorry, I am sure my questions were far too broad to answer. >> >> Let me *try* to ask more specific questions. Assuming all data requests >> are >> cold (random reading pattern) and everything comes from the disks (no >> block >> cache), what level of concurrency can HDFS handle? > > > Cold cache, random reads ==> totally governed by seeks, so governed by # of > spindles per box. > > A SATA drive can do about 100 random seeks per sec, ie, 100 reads/second > > > >> Almost all of the load is >> controlled data processing, but we have to do a lot of work at night >> during >> the batch window so something in the 15-20,000 QPS range would meet >> current >> worse case requirements. How many nodes would be required to effectively >> return data against a 50TB aggregate data store? > > > Assuming 12 drives per node, and a cache hit rate of 0% (since its most > cold cache), you will see about 12 * 100 = 1200 reads per second per node. > If your cache hit rate goes up to 25%, then, your read rate is 1600 > reads/sec/node > Thus, 10 machines can serve about 12k-16k reads/sec, cold cache. > > 50TB of data on 10 machine => 5 TB/node. That might be bit too much for > each region server (a RS can do about 700 regions comfortably, each of 1G). > If you push, you might get 2TB/regionserver, or, 25 machines for total. If > the data compresses 50%, then 12-13 nodes. > > So, for your scenario, its about 12-13 RS's, with 12 drives each, and you > will comfortably do 24k QPS cold cache. > > Does that help? > > > >> Disk I/O assumedly starts >> to break down at a certain point in terms of concurrent readers/node/disk. >> We have in our control how many total concurrent readers there are, so if >> we >> can get 10ms response time with 100 readers that might be better than >> 100ms >> responses from 1000 concurrent readers. >> >> Thanks. >> >> >> On Fri, Dec 17, 2010 at 2:46 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] >> >wrote: >> >> > Hi Wayne, >> > >> > This question has such a large scope but is applicable to such a tiny
-
Re: Cluster Size/Node DensityJason Rutherglen 2011-02-18, 19:36
> We are also using a 5Gb region size to keep our region
> counts in the 100-200 range/node per Jonathan Grey's recommendation. So there isn't a penalty incurred from increasing the max region size from 256MB to 5GB? On Fri, Feb 18, 2011 at 10:12 AM, Wayne <[EMAIL PROTECTED]> wrote: > We have managed to get a little more than 1k QPS to date with 10 nodes. > Honestly we are not quite convinced that disk i/o seeks are our biggest > bottleneck. Of course they should be...but waiting for RPC connections, > network latency, thrift etc. all play into the time to get reads. The std > dev. of read time is way too high for our comfort, but os cache and other > things make it hard to benchmark this. Our average read time has jumped to > ~65ms which is a lot more than we expected (we expected 30-40ms). Our reads > are not as small as originally thought, but the 65ms still seems high. We > would love to have added a scanOpenWithStop single rpc read call (open, get > all rows, close scanner in one rpc call). We are moving to 10k disks (5 data > disks per node) so once we get some of these nodes in place we can see how > things compare. I suspect things won't change much...which will confirm disk > i/o is not our only bottleneck. I will be happy to be wrong... > > Based on the performance we have seen we expect to need 40 nodes. With some > fine tuning the 40 nodes can deliver 5k QPS which is what we need to run our > application processes. We have a substantial write load as well, so our > planning and numbers allow spare cycles for dealing with significant writes > at the same time. We have cache set way low to just be used by .META. and > all our tables have cache turned off. Memcached will be sitting on top to > help client access. We are also using a 5Gb region size to keep our region > counts in the 100-200 range/node per Jonathan Grey's recommendation. > > We have thought about going to infiniband or 10g, but with our smaller node > sizes we don't think it will make much difference. The cost of infiniband > for all 40 nodes buys us another 6 nodes...money better spent on scaling > out. > > > On Fri, Feb 18, 2011 at 2:15 AM, M. C. Srivas <[EMAIL PROTECTED]> wrote: > >> I was reading this thread with interest. Here's my $.02 >> >> On Fri, Dec 17, 2010 at 12:29 PM, Wayne <[EMAIL PROTECTED]> wrote: >> >>> Sorry, I am sure my questions were far too broad to answer. >>> >>> Let me *try* to ask more specific questions. Assuming all data requests >>> are >>> cold (random reading pattern) and everything comes from the disks (no >>> block >>> cache), what level of concurrency can HDFS handle? >> >> >> Cold cache, random reads ==> totally governed by seeks, so governed by # of >> spindles per box. >> >> A SATA drive can do about 100 random seeks per sec, ie, 100 reads/second >> >> >> >>> Almost all of the load is >>> controlled data processing, but we have to do a lot of work at night >>> during >>> the batch window so something in the 15-20,000 QPS range would meet >>> current >>> worse case requirements. How many nodes would be required to effectively >>> return data against a 50TB aggregate data store? >> >> >> Assuming 12 drives per node, and a cache hit rate of 0% (since its most >> cold cache), you will see about 12 * 100 = 1200 reads per second per node. >> If your cache hit rate goes up to 25%, then, your read rate is 1600 >> reads/sec/node >> Thus, 10 machines can serve about 12k-16k reads/sec, cold cache. >> >> 50TB of data on 10 machine => 5 TB/node. That might be bit too much for >> each region server (a RS can do about 700 regions comfortably, each of 1G). >> If you push, you might get 2TB/regionserver, or, 25 machines for total. If >> the data compresses 50%, then 12-13 nodes. >> >> So, for your scenario, its about 12-13 RS's, with 12 drives each, and you >> will comfortably do 24k QPS cold cache. >> >> Does that help? >> >> >> >>> Disk I/O assumedly starts >>> to break down at a certain point in terms of concurrent readers/node/disk.
-
Re: Cluster Size/Node DensityJean-Daniel Cryans 2011-02-18, 19:43
Less regions, but it's often a good thing if you have a lot of data :)
It's probably a good thing to bump the HDFS block size to 128 or 256MB since you know you're going to have huge-ish files. But anyway regarding penalties, I can't think of one that clearly comes out (unless you use a very small heap). The IO usage patterns will change, but unless you flush very small files all the time and need to recompact them into much bigger ones, then it shouldn't really be an issue. J-D On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen <[EMAIL PROTECTED]> wrote: >> We are also using a 5Gb region size to keep our region >> counts in the 100-200 range/node per Jonathan Grey's recommendation. > > So there isn't a penalty incurred from increasing the max region size > from 256MB to 5GB? >
-
Re: Cluster Size/Node DensityChris Tarnas 2011-02-18, 19:50
Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you have really large regions?
-chris On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > Less regions, but it's often a good thing if you have a lot of data :) > > It's probably a good thing to bump the HDFS block size to 128 or 256MB > since you know you're going to have huge-ish files. > > But anyway regarding penalties, I can't think of one that clearly > comes out (unless you use a very small heap). The IO usage patterns > will change, but unless you flush very small files all the time and > need to recompact them into much bigger ones, then it shouldn't really > be an issue. > > J-D > > On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen > <[EMAIL PROTECTED]> wrote: >>> We are also using a 5Gb region size to keep our region >>> counts in the 100-200 range/node per Jonathan Grey's recommendation. >> >> So there isn't a penalty incurred from increasing the max region size >> from 256MB to 5GB? >>
-
Re: Cluster Size/Node DensityJean-Daniel Cryans 2011-02-18, 19:59
That's what I usually recommend, the bigger the flushed files the
better. On the other hand, you only have so much memory to dedicate to the MemStore... J-D On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you have really large regions? > > -chris > > On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > >> Less regions, but it's often a good thing if you have a lot of data :) >> >> It's probably a good thing to bump the HDFS block size to 128 or 256MB >> since you know you're going to have huge-ish files. >> >> But anyway regarding penalties, I can't think of one that clearly >> comes out (unless you use a very small heap). The IO usage patterns >> will change, but unless you flush very small files all the time and >> need to recompact them into much bigger ones, then it shouldn't really >> be an issue. >> >> J-D >> >> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen >> <[EMAIL PROTECTED]> wrote: >>>> We are also using a 5Gb region size to keep our region >>>> counts in the 100-200 range/node per Jonathan Grey's recommendation. >>> >>> So there isn't a penalty incurred from increasing the max region size >>> from 256MB to 5GB? >>> > >
-
Re: Cluster Size/Node DensityChris Tarnas 2011-02-18, 20:05
Thank you , ad that bring me to my next question...
What is the current recommendation on the max heap size for Hbase if RAM on the server is not an issue? Right now I am at 8GB and have no issues, can I safely do 12GB? The servers have plenty of RAM (48GB) so that should not be an issue - I just want to minimize the risk that GC will cause problems. thanks again. -chris On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: > That's what I usually recommend, the bigger the flushed files the > better. On the other hand, you only have so much memory to dedicate to > the MemStore... > > J-D > > On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: >> Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you have really large regions? >> >> -chris >> >> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: >> >>> Less regions, but it's often a good thing if you have a lot of data :) >>> >>> It's probably a good thing to bump the HDFS block size to 128 or 256MB >>> since you know you're going to have huge-ish files. >>> >>> But anyway regarding penalties, I can't think of one that clearly >>> comes out (unless you use a very small heap). The IO usage patterns >>> will change, but unless you flush very small files all the time and >>> need to recompact them into much bigger ones, then it shouldn't really >>> be an issue. >>> >>> J-D >>> >>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen >>> <[EMAIL PROTECTED]> wrote: >>>>> We are also using a 5Gb region size to keep our region >>>>> counts in the 100-200 range/node per Jonathan Grey's recommendation. >>>> >>>> So there isn't a penalty incurred from increasing the max region size >>>> from 256MB to 5GB? >>>> >> >>
-
Re: Cluster Size/Node DensityTed Dunning 2011-02-18, 20:08
Actually, having a smaller heap will decrease the risk of a catastrophic GC.
It probably wil also increase the likelihood of a full GC. Having a larger heap will let you go long without a full GC, but with a very large heap a full GC may take your region server off-line long enough to be considered a failure. Then you will have cascading badness. On Fri, Feb 18, 2011 at 12:05 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > Thank you , ad that bring me to my next question... > > What is the current recommendation on the max heap size for Hbase if RAM on > the server is not an issue? Right now I am at 8GB and have no issues, can I > safely do 12GB? The servers have plenty of RAM (48GB) so that should not be > an issue - I just want to minimize the risk that GC will cause problems. > > thanks again. > -chris > > On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: > > > That's what I usually recommend, the bigger the flushed files the > > better. On the other hand, you only have so much memory to dedicate to > > the MemStore... > > > > J-D > > > > On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > >> Would it be a good idea to raise the hbase.hregion.memstore.flush.size > if you have really large regions? > >> > >> -chris > >> > >> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > >> > >>> Less regions, but it's often a good thing if you have a lot of data :) > >>> > >>> It's probably a good thing to bump the HDFS block size to 128 or 256MB > >>> since you know you're going to have huge-ish files. > >>> > >>> But anyway regarding penalties, I can't think of one that clearly > >>> comes out (unless you use a very small heap). The IO usage patterns > >>> will change, but unless you flush very small files all the time and > >>> need to recompact them into much bigger ones, then it shouldn't really > >>> be an issue. > >>> > >>> J-D > >>> > >>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen > >>> <[EMAIL PROTECTED]> wrote: > >>>>> We are also using a 5Gb region size to keep our region > >>>>> counts in the 100-200 range/node per Jonathan Grey's recommendation. > >>>> > >>>> So there isn't a penalty incurred from increasing the max region size > >>>> from 256MB to 5GB? > >>>> > >> > >> > >
-
Re: Cluster Size/Node DensityJean-Daniel Cryans 2011-02-18, 20:10
The bigger the heap the longer the GC pause of the world when
fragmentation requires it, 8GB is "safer". In 0.90.1 you can try enabling the new memstore allocator that seems to do a really good job, checkout the jira first: https://issues.apache.org/jira/browse/HBASE-3455 J-D On Fri, Feb 18, 2011 at 12:05 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > Thank you , ad that bring me to my next question... > > What is the current recommendation on the max heap size for Hbase if RAM on the server is not an issue? Right now I am at 8GB and have no issues, can I safely do 12GB? The servers have plenty of RAM (48GB) so that should not be an issue - I just want to minimize the risk that GC will cause problems. > > thanks again. > -chris > > On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: > >> That's what I usually recommend, the bigger the flushed files the >> better. On the other hand, you only have so much memory to dedicate to >> the MemStore... >> >> J-D >> >> On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: >>> Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you have really large regions? >>> >>> -chris >>> >>> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: >>> >>>> Less regions, but it's often a good thing if you have a lot of data :) >>>> >>>> It's probably a good thing to bump the HDFS block size to 128 or 256MB >>>> since you know you're going to have huge-ish files. >>>> >>>> But anyway regarding penalties, I can't think of one that clearly >>>> comes out (unless you use a very small heap). The IO usage patterns >>>> will change, but unless you flush very small files all the time and >>>> need to recompact them into much bigger ones, then it shouldn't really >>>> be an issue. >>>> >>>> J-D >>>> >>>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen >>>> <[EMAIL PROTECTED]> wrote: >>>>>> We are also using a 5Gb region size to keep our region >>>>>> counts in the 100-200 range/node per Jonathan Grey's recommendation. >>>>> >>>>> So there isn't a penalty incurred from increasing the max region size >>>>> from 256MB to 5GB? >>>>> >>> >>> > >
-
Re: Cluster Size/Node DensityTodd Lipcon 2011-02-18, 22:46
On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> The bigger the heap the longer the GC pause of the world when fragmentation requires it, 8GB is "safer". > On my boxes, a stop-the-world on 8G heap is already around 80 seconds... pretty catastrophic. Of course we've bumped the ZK timeout up to several minutes these days, but it's just a bandaid. > > In 0.90.1 you can try enabling the new memstore allocator that seems > to do a really good job, checkout the jira first: > https://issues.apache.org/jira/browse/HBASE-3455 > > Yep. Hopefully will have time to do a blog post this weekend about it as well. In my testing, try as I might, I can't get my region servers to do a full GC anymore when this is enabled. -Todd > On Fri, Feb 18, 2011 at 12:05 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > > Thank you , ad that bring me to my next question... > > > > What is the current recommendation on the max heap size for Hbase if RAM > on the server is not an issue? Right now I am at 8GB and have no issues, can > I safely do 12GB? The servers have plenty of RAM (48GB) so that should not > be an issue - I just want to minimize the risk that GC will cause problems. > > > > thanks again. > > -chris > > > > On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: > > > >> That's what I usually recommend, the bigger the flushed files the > >> better. On the other hand, you only have so much memory to dedicate to > >> the MemStore... > >> > >> J-D > >> > >> On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > >>> Would it be a good idea to raise the hbase.hregion.memstore.flush.size > if you have really large regions? > >>> > >>> -chris > >>> > >>> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > >>> > >>>> Less regions, but it's often a good thing if you have a lot of data :) > >>>> > >>>> It's probably a good thing to bump the HDFS block size to 128 or 256MB > >>>> since you know you're going to have huge-ish files. > >>>> > >>>> But anyway regarding penalties, I can't think of one that clearly > >>>> comes out (unless you use a very small heap). The IO usage patterns > >>>> will change, but unless you flush very small files all the time and > >>>> need to recompact them into much bigger ones, then it shouldn't really > >>>> be an issue. > >>>> > >>>> J-D > >>>> > >>>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen > >>>> <[EMAIL PROTECTED]> wrote: > >>>>>> We are also using a 5Gb region size to keep our region > >>>>>> counts in the 100-200 range/node per Jonathan Grey's recommendation. > >>>>> > >>>>> So there isn't a penalty incurred from increasing the max region size > >>>>> from 256MB to 5GB? > >>>>> > >>> > >>> > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Cluster Size/Node DensityWayne 2011-02-19, 14:43
What JVM is recommended for the new memstore allocator? We swtiched from u23
back to u17 which helped a lot. Is this optimized for a specific JVM or does it not matter? On Fri, Feb 18, 2011 at 5:46 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] > >wrote: > > > The bigger the heap the longer the GC pause of the world when > > fragmentation requires it, 8GB is "safer". > > > > On my boxes, a stop-the-world on 8G heap is already around 80 seconds... > pretty catastrophic. Of course we've bumped the ZK timeout up to several > minutes these days, but it's just a bandaid. > > > > > > In 0.90.1 you can try enabling the new memstore allocator that seems > > to do a really good job, checkout the jira first: > > https://issues.apache.org/jira/browse/HBASE-3455 > > > > > Yep. Hopefully will have time to do a blog post this weekend about it as > well. In my testing, try as I might, I can't get my region servers to do a > full GC anymore when this is enabled. > > -Todd > > > > On Fri, Feb 18, 2011 at 12:05 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > > > Thank you , ad that bring me to my next question... > > > > > > What is the current recommendation on the max heap size for Hbase if > RAM > > on the server is not an issue? Right now I am at 8GB and have no issues, > can > > I safely do 12GB? The servers have plenty of RAM (48GB) so that should > not > > be an issue - I just want to minimize the risk that GC will cause > problems. > > > > > > thanks again. > > > -chris > > > > > > On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: > > > > > >> That's what I usually recommend, the bigger the flushed files the > > >> better. On the other hand, you only have so much memory to dedicate to > > >> the MemStore... > > >> > > >> J-D > > >> > > >> On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: > > >>> Would it be a good idea to raise the > hbase.hregion.memstore.flush.size > > if you have really large regions? > > >>> > > >>> -chris > > >>> > > >>> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > > >>> > > >>>> Less regions, but it's often a good thing if you have a lot of data > :) > > >>>> > > >>>> It's probably a good thing to bump the HDFS block size to 128 or > 256MB > > >>>> since you know you're going to have huge-ish files. > > >>>> > > >>>> But anyway regarding penalties, I can't think of one that clearly > > >>>> comes out (unless you use a very small heap). The IO usage patterns > > >>>> will change, but unless you flush very small files all the time and > > >>>> need to recompact them into much bigger ones, then it shouldn't > really > > >>>> be an issue. > > >>>> > > >>>> J-D > > >>>> > > >>>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen > > >>>> <[EMAIL PROTECTED]> wrote: > > >>>>>> We are also using a 5Gb region size to keep our region > > >>>>>> counts in the 100-200 range/node per Jonathan Grey's > recommendation. > > >>>>> > > >>>>> So there isn't a penalty incurred from increasing the max region > size > > >>>>> from 256MB to 5GB? > > >>>>> > > >>> > > >>> > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: Cluster Size/Node DensityStack 2011-02-19, 19:48
It is not jvm version dependent.
Stack On Feb 19, 2011, at 6:43, Wayne <[EMAIL PROTECTED]> wrote: > What JVM is recommended for the new memstore allocator? We swtiched from u23 > back to u17 which helped a lot. Is this optimized for a specific JVM or does > it not matter? > > On Fri, Feb 18, 2011 at 5:46 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > >> On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans <[EMAIL PROTECTED] >>> wrote: >> >>> The bigger the heap the longer the GC pause of the world when >> >> fragmentation requires it, 8GB is "safer". >>> >> >> On my boxes, a stop-the-world on 8G heap is already around 80 seconds... >> pretty catastrophic. Of course we've bumped the ZK timeout up to several >> minutes these days, but it's just a bandaid. >> >> >>> >>> In 0.90.1 you can try enabling the new memstore allocator that seems >>> to do a really good job, checkout the jira first: >>> https://issues.apache.org/jira/browse/HBASE-3455 >>> >>> >> Yep. Hopefully will have time to do a blog post this weekend about it as >> well. In my testing, try as I might, I can't get my region servers to do a >> full GC anymore when this is enabled. >> >> -Todd >> >> >>> On Fri, Feb 18, 2011 at 12:05 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote: >>>> Thank you , ad that bring me to my next question... >>>> >>>> What is the current recommendation on the max heap size for Hbase if >> RAM >>> on the server is not an issue? Right now I am at 8GB and have no issues, >> can >>> I safely do 12GB? The servers have plenty of RAM (48GB) so that should >> not >>> be an issue - I just want to minimize the risk that GC will cause >> problems. >>>> >>>> thanks again. >>>> -chris >>>> >>>> On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: >>>> >>>>> That's what I usually recommend, the bigger the flushed files the >>>>> better. On the other hand, you only have so much memory to dedicate to >>>>> the MemStore... >>>>> >>>>> J-D >>>>> >>>>> On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: >>>>>> Would it be a good idea to raise the >> hbase.hregion.memstore.flush.size >>> if you have really large regions? >>>>>> >>>>>> -chris >>>>>> >>>>>> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: >>>>>> >>>>>>> Less regions, but it's often a good thing if you have a lot of data >> :) >>>>>>> >>>>>>> It's probably a good thing to bump the HDFS block size to 128 or >> 256MB >>>>>>> since you know you're going to have huge-ish files. >>>>>>> >>>>>>> But anyway regarding penalties, I can't think of one that clearly >>>>>>> comes out (unless you use a very small heap). The IO usage patterns >>>>>>> will change, but unless you flush very small files all the time and >>>>>>> need to recompact them into much bigger ones, then it shouldn't >> really >>>>>>> be an issue. >>>>>>> >>>>>>> J-D >>>>>>> >>>>>>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen >>>>>>> <[EMAIL PROTECTED]> wrote: >>>>>>>>> We are also using a 5Gb region size to keep our region >>>>>>>>> counts in the 100-200 range/node per Jonathan Grey's >> recommendation. >>>>>>>> >>>>>>>> So there isn't a penalty incurred from increasing the max region >> size >>>>>>>> from 256MB to 5GB? >>>>>>>> >>>>>> >>>>>> >>>> >>>> >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >>
-
Re: Cluster Size/Node DensityJean-Daniel Cryans 2011-02-19, 23:22
It would be the second report of someone having u23 being less stable
than u17 that I see in less than a week. Interesting... J-D On Sat, Feb 19, 2011 at 9:43 AM, Wayne <[EMAIL PROTECTED]> wrote: > What JVM is recommended for the new memstore allocator? We swtiched from u23 > back to u17 which helped a lot. Is this optimized for a specific JVM or does > it not matter? > > On Fri, Feb 18, 2011 at 5:46 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> >> On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans >> <[EMAIL PROTECTED]>wrote: >> >> > The bigger the heap the longer the GC pause of the world when >> >> fragmentation requires it, 8GB is "safer". >> > >> >> On my boxes, a stop-the-world on 8G heap is already around 80 seconds... >> pretty catastrophic. Of course we've bumped the ZK timeout up to several >> minutes these days, but it's just a bandaid. >> >> >> > >> > In 0.90.1 you can try enabling the new memstore allocator that seems >> > to do a really good job, checkout the jira first: >> > https://issues.apache.org/jira/browse/HBASE-3455 >> > >> > >> Yep. Hopefully will have time to do a blog post this weekend about it as >> well. In my testing, try as I might, I can't get my region servers to do a >> full GC anymore when this is enabled. >> >> -Todd >> >> >> > On Fri, Feb 18, 2011 at 12:05 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote: >> > > Thank you , ad that bring me to my next question... >> > > >> > > What is the current recommendation on the max heap size for Hbase if >> > > RAM >> > on the server is not an issue? Right now I am at 8GB and have no issues, >> > can >> > I safely do 12GB? The servers have plenty of RAM (48GB) so that should >> > not >> > be an issue - I just want to minimize the risk that GC will cause >> > problems. >> > > >> > > thanks again. >> > > -chris >> > > >> > > On Feb 18, 2011, at 11:59 AM, Jean-Daniel Cryans wrote: >> > > >> > >> That's what I usually recommend, the bigger the flushed files the >> > >> better. On the other hand, you only have so much memory to dedicate >> > >> to >> > >> the MemStore... >> > >> >> > >> J-D >> > >> >> > >> On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas <[EMAIL PROTECTED]> wrote: >> > >>> Would it be a good idea to raise the >> > >>> hbase.hregion.memstore.flush.size >> > if you have really large regions? >> > >>> >> > >>> -chris >> > >>> >> > >>> On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: >> > >>> >> > >>>> Less regions, but it's often a good thing if you have a lot of data >> > >>>> :) >> > >>>> >> > >>>> It's probably a good thing to bump the HDFS block size to 128 or >> > >>>> 256MB >> > >>>> since you know you're going to have huge-ish files. >> > >>>> >> > >>>> But anyway regarding penalties, I can't think of one that clearly >> > >>>> comes out (unless you use a very small heap). The IO usage patterns >> > >>>> will change, but unless you flush very small files all the time and >> > >>>> need to recompact them into much bigger ones, then it shouldn't >> > >>>> really >> > >>>> be an issue. >> > >>>> >> > >>>> J-D >> > >>>> >> > >>>> On Fri, Feb 18, 2011 at 11:36 AM, Jason Rutherglen >> > >>>> <[EMAIL PROTECTED]> wrote: >> > >>>>>> We are also using a 5Gb region size to keep our region >> > >>>>>> counts in the 100-200 range/node per Jonathan Grey's >> > >>>>>> recommendation. >> > >>>>> >> > >>>>> So there isn't a penalty incurred from increasing the max region >> > >>>>> size >> > >>>>> from 256MB to 5GB? >> > >>>>> >> > >>> >> > >>> >> > > >> > > >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera > > |