|
Bradford Stephens
2010-09-01, 09:12
Andrew Purtell
2010-09-01, 11:14
Matthew LeMieux
2010-09-01, 14:24
Bradford Stephens
2010-09-01, 16:38
Jonathan Gray
2010-09-01, 17:04
Bradford Stephens
2010-09-01, 17:09
Gary Helmling
2010-09-01, 17:17
Andrew Purtell
2010-09-01, 17:45
Andrew Purtell
2010-09-01, 17:55
Bradford Stephens
2010-09-02, 00:03
Jean-Daniel Cryans
2010-09-02, 00:24
Bradford Stephens
2010-09-02, 00:37
Ryan Rawson
2010-09-02, 00:54
Bradford Stephens
2010-09-02, 00:56
Ryan Rawson
2010-09-02, 00:59
Bradford Stephens
2010-09-02, 01:58
Jonathan Gray
2010-09-02, 03:23
Andrew Purtell
2010-09-02, 19:10
Bradford Stephens
2010-09-03, 00:21
Matthew LeMieux
2010-09-03, 20:38
|
-
Slow Inserts on EC2 ClusterBradford Stephens 2010-09-01, 09:12
Hey guys,
I'm banging my head against some perf issues on EC2. I'm using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hbase scripts to handle the new version. I'm trying to insert about 22G of data across nodes on EC2 m1.large instances. I'm getting speeds of about 1200 rows/minute. It seems like most inserts are <1 ms. But then some take 3sec, and occasionally I see some take 30sec. It felt like a GC issue, but the data volume should be nowhere near enough to cause that. I'm using cascading.hbase, which does use the old API -- but I've never run into these perf issues before. Ideas? I'm sure it's something painfully obvious to everyone but moi :) Here's some logs: NameNode: http://pastebin.com/j09CJQJJ DataNode: http://pastebin.com/XudWcaxW RS: http://pastebin.com/wXPBAjpu RS GC: http://pastebin.com/jqJyKAXq -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
Re: Slow Inserts on EC2 ClusterAndrew Purtell 2010-09-01, 11:14
> From: Bradford Stephens
> I'm banging my head against some perf issues on EC2. I'm > using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hbase > scripts to handle the new version. > > I'm trying to insert about 22G of data across nodes on EC2 > m1.large instances [...] c1.xlarge provides (barely) adequate I/O bandwidth. Those periods of higher latency that you mention in the part of your mail that I clipped are probably due to hypervisor stealing your resources to attend to a noisy neighbor with a better reservation class. I would not consider EC2 a high performance platform, except for maybe their cluster compute nodes which have been specially engineered for HPC using a completely different virtualization and network architecture than the rest. EC2 is about bulk processing on a reasonable (subject to definition) timeframe at cheap/elastic prices. - Andy
-
Re: Slow Inserts on EC2 ClusterMatthew LeMieux 2010-09-01, 14:24
I'm starting to find that EC2 is not reliable enough to support HBase. I'm running into 2 things that might be related:
1) On idle machines that are apparently doing nothing (reports of <3% CPU utilization, no I/O wait) the load is reported as being higher than the number of cores. I don't know if attachments work on the mailing list, but I attached a small image anyway to illustrate this confusing thing. (I've been using m1.large and m2.xlarge running CDH3) 2) Every once in a while it seems that somebody hits the pause button on one of my instances, and while the CPU utilization stays low, the load value spikes to a high value. When this happens the region servers decide to close up shop. It appears to be a problem with contacting zookeeper servers (who happen to stay up and running, but perhaps somewhat unresponsive when Amazon decides to hit the pause button). I have extended the timeout for contacting zookeeper servers, but these events continue to persist. One such event happened 8 hours ago, and I still can't get HBase back up and running. I've seen many comments on this list informing users that they are using hardware (or virtual machines) that are simply not big enough, not fast enough, or don't have enough memory. I'd like to offer an alternative point of view. Whether or not EC2 will last is uncertain, but cloud computing environments will definitely be around for a long time. What would it take to make HBase resilient enough to take advantage of those environments? Based on my experience and comments on this list, it seems "HBase in the cloud" is still a rather painful proposition. -Matthew
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-01, 16:38
Wow, thanks. I didn't consider that ... I try to avoid the cloud if at
all possible :) Cheers, B On Wed, Sep 1, 2010 at 4:14 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> From: Bradford Stephens >> I'm banging my head against some perf issues on EC2. I'm >> using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hbase >> scripts to handle the new version. >> >> I'm trying to insert about 22G of data across nodes on EC2 >> m1.large instances [...] > > c1.xlarge provides (barely) adequate I/O bandwidth. > > Those periods of higher latency that you mention in the part of your mail that I clipped are probably due to hypervisor stealing your resources to attend to a noisy neighbor with a better reservation class. > > I would not consider EC2 a high performance platform, except for maybe their cluster compute nodes which have been specially engineered for HPC using a completely different virtualization and network architecture than the rest. EC2 is about bulk processing on a reasonable (subject to definition) timeframe at cheap/elastic prices. > > - Andy > > > > > > > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
RE: Slow Inserts on EC2 ClusterJonathan Gray 2010-09-01, 17:04
While I completely agree with much of what you're saying, and am usually one of the first to encourage people to not use virtual machines w/ HBase, I know of several successful deployments of HBase on EC2. In most instances there was some pain encountered, but it does work for some.
I've not seen these specific issues you seem to be running in to (periodically spiking load but no cpu or iowait). I'm not sure I know what HBase could do to operate better in these environments. I'm not sure I understand exactly what is happening to RS and ZooKeeper when EC2 is being weird. You can't talk to ZK because of a networking issue? Have you dug in to the ZK server logs to see what's up? HBase is a highly available service, we need to do heartbeating of some kind, so lose of network connectivity is a killer. It could also be that ZK is being starved of IO so that it cannot write to its transaction log and that is what is slowing it down. JG > -----Original Message----- > From: Matthew LeMieux [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 01, 2010 7:25 AM > To: [EMAIL PROTECTED] > Subject: Re: Slow Inserts on EC2 Cluster > > I'm starting to find that EC2 is not reliable enough to support HBase. > I'm running into 2 things that might be related: > > 1) On idle machines that are apparently doing nothing (reports of <3% > CPU utilization, no I/O wait) the load is reported as being higher > than the number of cores. I don't know if attachments work on the > mailing list, but I attached a small image anyway to illustrate this > confusing thing. (I've been using m1.large and m2.xlarge running CDH3) > > 2) Every once in a while it seems that somebody hits the pause button > on one of my instances, and while the CPU utilization stays low, the > load value spikes to a high value. When this happens the region > servers decide to close up shop. It appears to be a problem with > contacting zookeeper servers (who happen to stay up and running, but > perhaps somewhat unresponsive when Amazon decides to hit the pause > button). I have extended the timeout for contacting zookeeper servers, > but these events continue to persist. One such event happened 8 hours > ago, and I still can't get HBase back up and running. > > I've seen many comments on this list informing users that they are > using hardware (or virtual machines) that are simply not big enough, > not fast enough, or don't have enough memory. I'd like to offer an > alternative point of view. Whether or not EC2 will last is uncertain, > but cloud computing environments will definitely be around for a long > time. What would it take to make HBase resilient enough to take > advantage of those environments? Based on my experience and comments > on this list, it seems "HBase in the cloud" is still a rather painful > proposition. > > -Matthew
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-01, 17:09
I think it's mostly a matter of cost-efficiency -- HBase *runs* just
fine on EC2, and is built to be in a transient environment. It's just not always cost-effective because you have to use pricey instances. As far as my issue -- it didn't seem to be ZK. I like Andrew's point, I'll knock it up to bigger instances and see what's up. -B On Wed, Sep 1, 2010 at 10:04 AM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > While I completely agree with much of what you're saying, and am usually one of the first to encourage people to not use virtual machines w/ HBase, I know of several successful deployments of HBase on EC2. In most instances there was some pain encountered, but it does work for some. > > I've not seen these specific issues you seem to be running in to (periodically spiking load but no cpu or iowait). > > I'm not sure I know what HBase could do to operate better in these environments. I'm not sure I understand exactly what is happening to RS and ZooKeeper when EC2 is being weird. You can't talk to ZK because of a networking issue? Have you dug in to the ZK server logs to see what's up? > > HBase is a highly available service, we need to do heartbeating of some kind, so lose of network connectivity is a killer. > > It could also be that ZK is being starved of IO so that it cannot write to its transaction log and that is what is slowing it down. > > JG > >> -----Original Message----- >> From: Matthew LeMieux [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, September 01, 2010 7:25 AM >> To: [EMAIL PROTECTED] >> Subject: Re: Slow Inserts on EC2 Cluster >> >> I'm starting to find that EC2 is not reliable enough to support HBase. >> I'm running into 2 things that might be related: >> >> 1) On idle machines that are apparently doing nothing (reports of <3% >> CPU utilization, no I/O wait) the load is reported as being higher >> than the number of cores. I don't know if attachments work on the >> mailing list, but I attached a small image anyway to illustrate this >> confusing thing. (I've been using m1.large and m2.xlarge running CDH3) >> >> 2) Every once in a while it seems that somebody hits the pause button >> on one of my instances, and while the CPU utilization stays low, the >> load value spikes to a high value. When this happens the region >> servers decide to close up shop. It appears to be a problem with >> contacting zookeeper servers (who happen to stay up and running, but >> perhaps somewhat unresponsive when Amazon decides to hit the pause >> button). I have extended the timeout for contacting zookeeper servers, >> but these events continue to persist. One such event happened 8 hours >> ago, and I still can't get HBase back up and running. >> >> I've seen many comments on this list informing users that they are >> using hardware (or virtual machines) that are simply not big enough, >> not fast enough, or don't have enough memory. I'd like to offer an >> alternative point of view. Whether or not EC2 will last is uncertain, >> but cloud computing environments will definitely be around for a long >> time. What would it take to make HBase resilient enough to take >> advantage of those environments? Based on my experience and comments >> on this list, it seems "HBase in the cloud" is still a rather painful >> proposition. >> >> -Matthew > > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
Re: Slow Inserts on EC2 ClusterGary Helmling 2010-09-01, 17:17
On Wed, Sep 1, 2010 at 7:24 AM, Matthew LeMieux <[EMAIL PROTECTED]> wrote:
> I'm starting to find that EC2 is not reliable enough to support HBase. I'm > running into 2 things that might be related: > > 1) On idle machines that are apparently doing nothing (reports of <3% CPU > utilization, no I/O wait) the load is reported as being higher than the > number of cores. I don't know if attachments work on the mailing list, but > I attached a small image anyway to illustrate this confusing thing. (I've > been using m1.large and m2.xlarge running CDH3) > > If you're using AMIs based on the latest Ubuntu (10.4), theres a known kernel issue that seems to be causing high loads while idle. More info here: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 It's possible other distros running 2.6.32 may be showing the same problem as well.
-
Re: Slow Inserts on EC2 ClusterAndrew Purtell 2010-09-01, 17:45
> From: Matthew LeMieux
> I'm starting to find that EC2 is not reliable enough to support > HBase. [...] > (I've been using m1.large and m2.xlarge running CDH3) I personally don't use EC2 for anything more than on demand ad hoc testing, but I do know of successful deployments there. However, I at least have been consistent in my advice to use c1.xlarge instances. Note, **c**1.xlarge. This instance type is what has worked reasonably well for me. Other/lesser/cheaper ones in terms of virtual compute units have not. > What would it take to make HBase resilient enough to take > advantage of those environments? Based on my experience > and comments on this list, it seems "HBase > in the cloud" is still a rather painful proposition. This is a good question and a valid point. There is tension between - tuning down ZooKeeper timeouts etc. to quickly identify failed nodes thus to trigger rapid redeployment of the regions to minimize their unavailability - tuning up ZooKeeper timeouts etc. to ride over stop-the-world GC or foibles of virtualized environments There are open JIRAs in this area. For example, https://issues.apache.org/jira/browse/HBASE-1316 If you have some ideas or code that might or do demonstrate better behavior in environments like EC2, we'd love to hear them or see it! - Andy
-
Re: Slow Inserts on EC2 ClusterAndrew Purtell 2010-09-01, 17:55
> From: Gary Helmling
> > If you're using AMIs based on the latest Ubuntu (10.4), > theres a known kernel issue that seems to be causing > high loads while idle. More info here: > > https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 Seems best to avoid using Lucid on EC2 for now, then. FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 - Andy
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-02, 00:03
'allo,
I changed the cluster form m1.large to c1.xlarge -- we're getting about 4k inserts /node / minute instead of 2k. A small improvement, but nowhere near what I'm used to, even from vague memories of old clusters on EC2. I also stripped all the Cascading from my code and have a very basic raw MR job -- we're basically reading raw text, splitting it into fields, and adding those rows to HBase. About the simplest task you could do. Ideas for next steps? What other info could I share? Cheers, B On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> From: Gary Helmling >> >> If you're using AMIs based on the latest Ubuntu (10.4), >> theres a known kernel issue that seems to be causing >> high loads while idle. More info here: >> >> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 > > Seems best to avoid using Lucid on EC2 for now, then. > > FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 > > - Andy > > > > > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
Re: Slow Inserts on EC2 ClusterJean-Daniel Cryans 2010-09-02, 00:24
Took a quick look at your RS log, it looks like you are using a lot of
families and loading them pretty much at the same rate. Look at lines that start with: INFO org.apache.hadoop.hbase.regionserver.Store: Added ... And you will see that you are dumping very small files on the filesystem, on average 5MB, that together account for ~64MB which is the default flush size (and then it generates tons of compactions which makes it even worse). Do you really need all those families? Try merging them and see the difference. J-D On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens <[EMAIL PROTECTED]> wrote: > 'allo, > > I changed the cluster form m1.large to c1.xlarge -- we're getting > about 4k inserts /node / minute instead of 2k. A small improvement, > but nowhere near what I'm used to, even from vague memories of old > clusters on EC2. > > I also stripped all the Cascading from my code and have a very basic > raw MR job -- we're basically reading raw text, splitting it into > fields, and adding those rows to HBase. About the simplest task you > could do. > > Ideas for next steps? What other info could I share? > > Cheers, > B > > On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> From: Gary Helmling >>> >>> If you're using AMIs based on the latest Ubuntu (10.4), >>> theres a known kernel issue that seems to be causing >>> high loads while idle. More info here: >>> >>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >> >> Seems best to avoid using Lucid on EC2 for now, then. >> >> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 >> >> - Andy >> >> >> >> >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-02, 00:37
Yeah, those families are all needed -- but I didn't realize the files
were so small. That's odd -- and you're right, that'd certainly throw it off. I'll merge them all and see if that helps. On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Took a quick look at your RS log, it looks like you are using a lot of > families and loading them pretty much at the same rate. Look at lines > that start with: > > INFO org.apache.hadoop.hbase.regionserver.Store: Added ... > > And you will see that you are dumping very small files on the > filesystem, on average 5MB, that together account for ~64MB which is > the default flush size (and then it generates tons of compactions > which makes it even worse). Do you really need all those families? Try > merging them and see the difference. > > J-D > > On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens > <[EMAIL PROTECTED]> wrote: >> 'allo, >> >> I changed the cluster form m1.large to c1.xlarge -- we're getting >> about 4k inserts /node / minute instead of 2k. A small improvement, >> but nowhere near what I'm used to, even from vague memories of old >> clusters on EC2. >> >> I also stripped all the Cascading from my code and have a very basic >> raw MR job -- we're basically reading raw text, splitting it into >> fields, and adding those rows to HBase. About the simplest task you >> could do. >> >> Ideas for next steps? What other info could I share? >> >> Cheers, >> B >> >> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>>> From: Gary Helmling >>>> >>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>> theres a known kernel issue that seems to be causing >>>> high loads while idle. More info here: >>>> >>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>> >>> Seems best to avoid using Lucid on EC2 for now, then. >>> >>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 >>> >>> - Andy >>> >>> >>> >>> >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
Re: Slow Inserts on EC2 ClusterRyan Rawson 2010-09-02, 00:54
There are a couple of things here happening, and some solutions:
- dont flush based on region size, only on family/store size. - do what the bigtable paper says and merge the smallest file with memstore while flushing thus keeping the net number of files low. The latter would probably benefit from the use of the block cache in some situations as well. On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens <[EMAIL PROTECTED]> wrote: > Yeah, those families are all needed -- but I didn't realize the files > were so small. That's odd -- and you're right, that'd certainly throw > it off. I'll merge them all and see if that helps. > > On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >> Took a quick look at your RS log, it looks like you are using a lot of >> families and loading them pretty much at the same rate. Look at lines >> that start with: >> >> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >> >> And you will see that you are dumping very small files on the >> filesystem, on average 5MB, that together account for ~64MB which is >> the default flush size (and then it generates tons of compactions >> which makes it even worse). Do you really need all those families? Try >> merging them and see the difference. >> >> J-D >> >> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >> <[EMAIL PROTECTED]> wrote: >>> 'allo, >>> >>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>> about 4k inserts /node / minute instead of 2k. A small improvement, >>> but nowhere near what I'm used to, even from vague memories of old >>> clusters on EC2. >>> >>> I also stripped all the Cascading from my code and have a very basic >>> raw MR job -- we're basically reading raw text, splitting it into >>> fields, and adding those rows to HBase. About the simplest task you >>> could do. >>> >>> Ideas for next steps? What other info could I share? >>> >>> Cheers, >>> B >>> >>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>>>> From: Gary Helmling >>>>> >>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>> theres a known kernel issue that seems to be causing >>>>> high loads while idle. More info here: >>>>> >>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>> >>>> Seems best to avoid using Lucid on EC2 for now, then. >>>> >>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 >>>> >>>> - Andy >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Bradford Stephens, >>> Founder, Drawn to Scale >>> drawntoscalehq.com >>> 727.697.7528 >>> >>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>> solution. Process, store, query, search, and serve all your data. >>> >>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>> Media, and Computer Science >>> >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-02, 00:56
Good call JD! We've gone from 20k inserts/minute to 200k. Much
better! I still think it's slower than I'd want by about one OOM, but it's progress. Since we're populating 12 families, I guess we're seeking for 12 files on each write. Not pretty. I'll look at the customer and see if they really have any sparse data that would benefit from its own ColumnFamily. Probably not. Cheers, B On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens <[EMAIL PROTECTED]> wrote: > Yeah, those families are all needed -- but I didn't realize the files > were so small. That's odd -- and you're right, that'd certainly throw > it off. I'll merge them all and see if that helps. > > On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >> Took a quick look at your RS log, it looks like you are using a lot of >> families and loading them pretty much at the same rate. Look at lines >> that start with: >> >> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >> >> And you will see that you are dumping very small files on the >> filesystem, on average 5MB, that together account for ~64MB which is >> the default flush size (and then it generates tons of compactions >> which makes it even worse). Do you really need all those families? Try >> merging them and see the difference. >> >> J-D >> >> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >> <[EMAIL PROTECTED]> wrote: >>> 'allo, >>> >>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>> about 4k inserts /node / minute instead of 2k. A small improvement, >>> but nowhere near what I'm used to, even from vague memories of old >>> clusters on EC2. >>> >>> I also stripped all the Cascading from my code and have a very basic >>> raw MR job -- we're basically reading raw text, splitting it into >>> fields, and adding those rows to HBase. About the simplest task you >>> could do. >>> >>> Ideas for next steps? What other info could I share? >>> >>> Cheers, >>> B >>> >>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>>>> From: Gary Helmling >>>>> >>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>> theres a known kernel issue that seems to be causing >>>>> high loads while idle. More info here: >>>>> >>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>> >>>> Seems best to avoid using Lucid on EC2 for now, then. >>>> >>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 >>>> >>>> - Andy >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Bradford Stephens, >>> Founder, Drawn to Scale >>> drawntoscalehq.com >>> 727.697.7528 >>> >>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>> solution. Process, store, query, search, and serve all your data. >>> >>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>> Media, and Computer Science >>> >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
Re: Slow Inserts on EC2 ClusterRyan Rawson 2010-09-02, 00:59
Yes exactly, column families have the same performance profile as
tables. 12 CF = 12 tables. -ryan On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens <[EMAIL PROTECTED]> wrote: > Good call JD! We've gone from 20k inserts/minute to 200k. Much > better! I still think it's slower than I'd want by about one OOM, but > it's progress. > > Since we're populating 12 families, I guess we're seeking for 12 files > on each write. Not pretty. I'll look at the customer and see if they > really have any sparse data that would benefit from its own > ColumnFamily. Probably not. > > Cheers, > B > > On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens > <[EMAIL PROTECTED]> wrote: >> Yeah, those families are all needed -- but I didn't realize the files >> were so small. That's odd -- and you're right, that'd certainly throw >> it off. I'll merge them all and see if that helps. >> >> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >>> Took a quick look at your RS log, it looks like you are using a lot of >>> families and loading them pretty much at the same rate. Look at lines >>> that start with: >>> >>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >>> >>> And you will see that you are dumping very small files on the >>> filesystem, on average 5MB, that together account for ~64MB which is >>> the default flush size (and then it generates tons of compactions >>> which makes it even worse). Do you really need all those families? Try >>> merging them and see the difference. >>> >>> J-D >>> >>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >>> <[EMAIL PROTECTED]> wrote: >>>> 'allo, >>>> >>>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>>> about 4k inserts /node / minute instead of 2k. A small improvement, >>>> but nowhere near what I'm used to, even from vague memories of old >>>> clusters on EC2. >>>> >>>> I also stripped all the Cascading from my code and have a very basic >>>> raw MR job -- we're basically reading raw text, splitting it into >>>> fields, and adding those rows to HBase. About the simplest task you >>>> could do. >>>> >>>> Ideas for next steps? What other info could I share? >>>> >>>> Cheers, >>>> B >>>> >>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>>>>> From: Gary Helmling >>>>>> >>>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>>> theres a known kernel issue that seems to be causing >>>>>> high loads while idle. More info here: >>>>>> >>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>>> >>>>> Seems best to avoid using Lucid on EC2 for now, then. >>>>> >>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 >>>>> >>>>> - Andy >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Bradford Stephens, >>>> Founder, Drawn to Scale >>>> drawntoscalehq.com >>>> 727.697.7528 >>>> >>>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>>> solution. Process, store, query, search, and serve all your data. >>>> >>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>>> Media, and Computer Science >>>> >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-02, 01:58
On the full data set (10 reducers), speeds are about 100k/minute (WAL
Disabled). Still much slower than I'd like, but I'll take it over the former :) On Wed, Sep 1, 2010 at 5:59 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Yes exactly, column families have the same performance profile as > tables. 12 CF = 12 tables. > > -ryan > > On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens > <[EMAIL PROTECTED]> wrote: >> Good call JD! We've gone from 20k inserts/minute to 200k. Much >> better! I still think it's slower than I'd want by about one OOM, but >> it's progress. >> >> Since we're populating 12 families, I guess we're seeking for 12 files >> on each write. Not pretty. I'll look at the customer and see if they >> really have any sparse data that would benefit from its own >> ColumnFamily. Probably not. >> >> Cheers, >> B >> >> On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens >> <[EMAIL PROTECTED]> wrote: >>> Yeah, those families are all needed -- but I didn't realize the files >>> were so small. That's odd -- and you're right, that'd certainly throw >>> it off. I'll merge them all and see if that helps. >>> >>> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >>>> Took a quick look at your RS log, it looks like you are using a lot of >>>> families and loading them pretty much at the same rate. Look at lines >>>> that start with: >>>> >>>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... >>>> >>>> And you will see that you are dumping very small files on the >>>> filesystem, on average 5MB, that together account for ~64MB which is >>>> the default flush size (and then it generates tons of compactions >>>> which makes it even worse). Do you really need all those families? Try >>>> merging them and see the difference. >>>> >>>> J-D >>>> >>>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens >>>> <[EMAIL PROTECTED]> wrote: >>>>> 'allo, >>>>> >>>>> I changed the cluster form m1.large to c1.xlarge -- we're getting >>>>> about 4k inserts /node / minute instead of 2k. A small improvement, >>>>> but nowhere near what I'm used to, even from vague memories of old >>>>> clusters on EC2. >>>>> >>>>> I also stripped all the Cascading from my code and have a very basic >>>>> raw MR job -- we're basically reading raw text, splitting it into >>>>> fields, and adding those rows to HBase. About the simplest task you >>>>> could do. >>>>> >>>>> Ideas for next steps? What other info could I share? >>>>> >>>>> Cheers, >>>>> B >>>>> >>>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>>>>>> From: Gary Helmling >>>>>>> >>>>>>> If you're using AMIs based on the latest Ubuntu (10.4), >>>>>>> theres a known kernel issue that seems to be causing >>>>>>> high loads while idle. More info here: >>>>>>> >>>>>>> https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 >>>>>> >>>>>> Seems best to avoid using Lucid on EC2 for now, then. >>>>>> >>>>>> FYI, the EC2 scripts that I use build AMIs based on Amazon's old FC8 AMI (with updates). See http://github.com/apurtell/hbase-ec2 >>>>>> >>>>>> - Andy >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Bradford Stephens, >>>>> Founder, Drawn to Scale >>>>> drawntoscalehq.com >>>>> 727.697.7528 >>>>> >>>>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>>>> solution. Process, store, query, search, and serve all your data. >>>>> >>>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>>>> Media, and Computer Science >>>>> >>>> >>> >>> >>> >>> -- >>> Bradford Stephens, >>> Founder, Drawn to Scale >>> drawntoscalehq.com >>> 727.697.7528 >>> >>> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >>> solution. Process, store, query, search, and serve all your data. >>> >>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>> Media, and Computer Science >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
RE: Slow Inserts on EC2 ClusterJonathan Gray 2010-09-02, 03:23
Been doing lots of importing recently. There are two easy ways to get big performance boosts.
The first is HFileOuputFormat. It works into existing tables now. Consistently see 10X+ performance this way versus API. If you must use the API, pre-create a bunch of regions for your table. You can avoid splits altogether this way. Splitting, region balancing, etc... all make data unavailable for periods of time. Avoiding this during an import is key to getting rid of all those outliers I'm sure you see. I assume you have already played with client-side batching and all that? And writes are random? Also, bump up your flush size if you're going to have so many families. HBASE-2375 will help when it gets finished. Until then, pre-create regions and avoid the churn. Or use bulk load. JG > -----Original Message----- > From: Bradford Stephens [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, September 01, 2010 6:58 PM > To: [EMAIL PROTECTED] > Subject: Re: Slow Inserts on EC2 Cluster > > On the full data set (10 reducers), speeds are about 100k/minute (WAL > Disabled). Still much slower than I'd like, but I'll take it over the > former :) > > On Wed, Sep 1, 2010 at 5:59 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > Yes exactly, column families have the same performance profile as > > tables. 12 CF = 12 tables. > > > > -ryan > > > > On Wed, Sep 1, 2010 at 5:56 PM, Bradford Stephens > > <[EMAIL PROTECTED]> wrote: > >> Good call JD! We've gone from 20k inserts/minute to 200k. Much > >> better! I still think it's slower than I'd want by about one OOM, > but > >> it's progress. > >> > >> Since we're populating 12 families, I guess we're seeking for 12 > files > >> on each write. Not pretty. I'll look at the customer and see if they > >> really have any sparse data that would benefit from its own > >> ColumnFamily. Probably not. > >> > >> Cheers, > >> B > >> > >> On Wed, Sep 1, 2010 at 5:37 PM, Bradford Stephens > >> <[EMAIL PROTECTED]> wrote: > >>> Yeah, those families are all needed -- but I didn't realize the > files > >>> were so small. That's odd -- and you're right, that'd certainly > throw > >>> it off. I'll merge them all and see if that helps. > >>> > >>> On Wed, Sep 1, 2010 at 5:24 PM, Jean-Daniel Cryans > <[EMAIL PROTECTED]> wrote: > >>>> Took a quick look at your RS log, it looks like you are using a > lot of > >>>> families and loading them pretty much at the same rate. Look at > lines > >>>> that start with: > >>>> > >>>> INFO org.apache.hadoop.hbase.regionserver.Store: Added ... > >>>> > >>>> And you will see that you are dumping very small files on the > >>>> filesystem, on average 5MB, that together account for ~64MB which > is > >>>> the default flush size (and then it generates tons of compactions > >>>> which makes it even worse). Do you really need all those families? > Try > >>>> merging them and see the difference. > >>>> > >>>> J-D > >>>> > >>>> On Wed, Sep 1, 2010 at 5:03 PM, Bradford Stephens > >>>> <[EMAIL PROTECTED]> wrote: > >>>>> 'allo, > >>>>> > >>>>> I changed the cluster form m1.large to c1.xlarge -- we're getting > >>>>> about 4k inserts /node / minute instead of 2k. A small > improvement, > >>>>> but nowhere near what I'm used to, even from vague memories of > old > >>>>> clusters on EC2. > >>>>> > >>>>> I also stripped all the Cascading from my code and have a very > basic > >>>>> raw MR job -- we're basically reading raw text, splitting it into > >>>>> fields, and adding those rows to HBase. About the simplest task > you > >>>>> could do. > >>>>> > >>>>> Ideas for next steps? What other info could I share? > >>>>> > >>>>> Cheers, > >>>>> B > >>>>> > >>>>> On Wed, Sep 1, 2010 at 10:55 AM, Andrew Purtell > <[EMAIL PROTECTED]> wrote: > >>>>>>> From: Gary Helmling > >>>>>>> > >>>>>>> If you're using AMIs based on the latest Ubuntu (10.4), > >>>>>>> theres a known kernel issue that seems to be causing > >>>>>>> high loads while idle. More info here:
-
Re: Slow Inserts on EC2 ClusterAndrew Purtell 2010-09-02, 19:10
> From: Bradford Stephens
> A small improvement, but nowhere near what I'm used to, > even from vague memories of old clusters on EC2. Those days are gone. Used to be m1.small provided reasonable performance for some apps. Now comment to the effect that the platform is simply too oversubscribed to use m1.small at all are common. - Andy
-
Re: Slow Inserts on EC2 ClusterBradford Stephens 2010-09-03, 00:21
Ah, that explains a lot.
Thanks for the tips JGray! I shall do that ASAP. On Thu, Sep 2, 2010 at 12:10 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> From: Bradford Stephens >> A small improvement, but nowhere near what I'm used to, >> even from vague memories of old clusters on EC2. > > Those days are gone. > > Used to be m1.small provided reasonable performance for some apps. > > Now comment to the effect that the platform is simply too oversubscribed to use m1.small at all are common. > > - Andy > > > > > > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
-
Re: Slow Inserts on EC2 ClusterMatthew LeMieux 2010-09-03, 20:38
Thank you for the pointer. I'm not sure if this is the bug I was encountering. This particular bug points to a problem with how load was calculated. The problem I was experiencing seemed to be a real issue that affected performance, not just reporting.
They published a fix on 20100827, but it doesn't seem to address the real problem of performance, just load reporting. In any case, I've downgraded from ubuntu lucid (10.04) to karmic (9.1) and am seeing a load reporting and response that is far more intuitive. I recommend avoiding lucid (at least in EC2). I've also upgraded to the latest release candidate that J-D posted (http://people.apache.org/~jdcryans/hbase-0.89.20100830-candidate-1/). (previously I was using CDH3) I'm very happy with the results. Stability is much better. It will take more than light breeze to knock the cluster over now! Thank you for your help, Matthew On Sep 1, 2010, at 10:17 AM, Gary Helmling wrote: > On Wed, Sep 1, 2010 at 7:24 AM, Matthew LeMieux <[EMAIL PROTECTED]> wrote: > >> I'm starting to find that EC2 is not reliable enough to support HBase. I'm >> running into 2 things that might be related: >> >> 1) On idle machines that are apparently doing nothing (reports of <3% CPU >> utilization, no I/O wait) the load is reported as being higher than the >> number of cores. I don't know if attachments work on the mailing list, but >> I attached a small image anyway to illustrate this confusing thing. (I've >> been using m1.large and m2.xlarge running CDH3) >> >> > If you're using AMIs based on the latest Ubuntu (10.4), theres a known > kernel issue that seems to be causing high loads while idle. More info > here: > > https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/574910 > > > It's possible other distros running 2.6.32 may be showing the same problem > as well. |