|
|
-
keeping an active hdfs cluster balanced
Stuart Smith 2011-03-17, 19:13
Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is:
My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).
I write (and delete) pretty actively to Hbase & some hdfs direct.
The cluster keeps drifting dangerously out of balance.
I run the balancer daily, but:
- I've seen reports that you shouldn't rebalance with regionservers running, yet, I don't really have a choice. Without HBase, my system is pretty much down. If it gets out of balance, it will also come down.
Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).
- Possibly somewhat related: I'm seeing more "failed to move block" errors in my balancer logs. It got to the point were I wasn't seeing any effective rebalancing occur. I've turned off access to the cluster and rebalanced (one node was down to 10% free space, a couple others when up to 50 or more). I'm back down to around 20-40% free space on each node (as reported by the hdfs web interface).
How effective is the balancer on a active cluster? Is there any way to make it's life easier, so it can stay in balance with daily runs?
I'm not sure why the one node ends up being so heavily favored, either. The favoritism even seems to survive taking the node down, and bringing it back up. If I can't find the resources to upgrade, I might try that again, but I'm less than hopeful about it.
Any ideas? Or do I just need better hardware? Not sure if that's an option, though..
Take care, -stu
-
Re: keeping an active hdfs cluster balanced
Allen Wittenauer 2011-03-17, 21:20
On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:
> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is: > > My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).
Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active grids. Your best bet is to either delete some data/store the data more efficiently, add more nodes, or upgrade the storage capacity of the nodes you have. The balancer is only going to save you for so long until the whole thing tips over.
> Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).
I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super active grids. It 'mostly works' until you get to the point of no return, which it sounds like you are heading for...
> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..
Depending upon how your systems are configured, something else to look at is how much space is getting ate by logs, mapreduce spill space, etc. A good daemon bounce might free up some stale handles as well.
-
Re: keeping an active hdfs cluster balanced
Ted Dunning 2011-03-17, 21:23
How large a cluster?
How large is each data-node? How much disk is devoted to hbase?
How does your HDFS data arrive? From one or a few machines in the cluster? From outside the cluster?
On Thu, Mar 17, 2011 at 12:13 PM, Stuart Smith <[EMAIL PROTECTED]> wrote:
> Parts of this may end up on the hbase list, but I thought I'd start here. > My basic problem is: > > My cluster is getting full enough that having one data node go down does > put a bit of pressure on the system (when balanced, every DN is more than > half full). > > I write (and delete) pretty actively to Hbase & some hdfs direct. > > The cluster keeps drifting dangerously out of balance. > > I run the balancer daily, but: > > - I've seen reports that you shouldn't rebalance with regionservers > running, yet, I don't really have a choice. Without HBase, my system is > pretty much down. If it gets out of balance, it will also come down. > > Anybody here have any idea how badly running the balancer on a heavily > active system messes things up? (for hdfs/hbase - if anyone knows). > > - Possibly somewhat related: I'm seeing more "failed to move block" > errors in my balancer logs. It got to the point were I wasn't seeing any > effective rebalancing occur. I've turned off access to the cluster and > rebalanced (one node was down to 10% free space, a couple others when up to > 50 or more). I'm back down to around 20-40% free space on each node (as > reported by the hdfs web interface). > > How effective is the balancer on a active cluster? Is there any way to > make it's life easier, so it can stay in balance with daily runs? > > I'm not sure why the one node ends up being so heavily favored, either. The > favoritism even seems to survive taking the node down, and bringing it back > up. If I can't find the resources to upgrade, I might try that again, but > I'm less than hopeful about it. > > Any ideas? Or do I just need better hardware? Not sure if that's an option, > though.. > > Take care, > -stu > > > >
-
Re: keeping an active hdfs cluster balanced
stu24mail@... 2011-03-17, 22:09
Thanks Allen!
This all makes sense. I'm already looking into expiring data - and good suggestion with the logs. I could do some things more efficiently data - but I'm not sure if I have any big wins I can pull off.
I'm in the midst of a OS upgrade & hope to switch from Apache to CDH as well. Hopefully I can clean some stuff up in the process.
It does sound like I'm just going to have to find some hardware somewhere..
Take care, -stu -----Original Message----- From: Allen Wittenauer <[EMAIL PROTECTED]> Date: Thu, 17 Mar 2011 14:20:06 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: keeping an active hdfs cluster balanced On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:
> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is: > > My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).
Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active grids. Your best bet is to either delete some data/store the data more efficiently, add more nodes, or upgrade the storage capacity of the nodes you have. The balancer is only going to save you for so long until the whole thing tips over.
> Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).
I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super active grids. It 'mostly works' until you get to the point of no return, which it sounds like you are heading for...
> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..
Depending upon how your systems are configured, something else to look at is how much space is getting ate by logs, mapreduce spill space, etc. A good daemon bounce might free up some stale handles as well.
-
Re: keeping an active hdfs cluster balanced
stu24mail@... 2011-03-17, 22:15
Hello Ted,
I have a small, 8 DN cluster, 6 of which are regionservers. Some have 3TB, others have 2TB. All have all disks available to hdfs - including the OS/system disk :|
The majority of the data goes to HBase, which then writes to hdfs. Some data is written to hdfs via thrift.
Take care, -stu -----Original Message----- From: Ted Dunning <[EMAIL PROTECTED]> Date: Thu, 17 Mar 2011 14:23:10 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Cc: Stuart Smith<[EMAIL PROTECTED]> Subject: Re: keeping an active hdfs cluster balanced
How large a cluster?
How large is each data-node? How much disk is devoted to hbase?
How does your HDFS data arrive? From one or a few machines in the cluster? From outside the cluster?
On Thu, Mar 17, 2011 at 12:13 PM, Stuart Smith <[EMAIL PROTECTED]> wrote:
> Parts of this may end up on the hbase list, but I thought I'd start here. > My basic problem is: > > My cluster is getting full enough that having one data node go down does > put a bit of pressure on the system (when balanced, every DN is more than > half full). > > I write (and delete) pretty actively to Hbase & some hdfs direct. > > The cluster keeps drifting dangerously out of balance. > > I run the balancer daily, but: > > - I've seen reports that you shouldn't rebalance with regionservers > running, yet, I don't really have a choice. Without HBase, my system is > pretty much down. If it gets out of balance, it will also come down. > > Anybody here have any idea how badly running the balancer on a heavily > active system messes things up? (for hdfs/hbase - if anyone knows). > > - Possibly somewhat related: I'm seeing more "failed to move block" > errors in my balancer logs. It got to the point were I wasn't seeing any > effective rebalancing occur. I've turned off access to the cluster and > rebalanced (one node was down to 10% free space, a couple others when up to > 50 or more). I'm back down to around 20-40% free space on each node (as > reported by the hdfs web interface). > > How effective is the balancer on a active cluster? Is there any way to > make it's life easier, so it can stay in balance with daily runs? > > I'm not sure why the one node ends up being so heavily favored, either. The > favoritism even seems to survive taking the node down, and bringing it back > up. If I can't find the resources to upgrade, I might try that again, but > I'm less than hopeful about it. > > Any ideas? Or do I just need better hardware? Not sure if that's an option, > though.. > > Take care, > -stu > > > >
-
Re: keeping an active hdfs cluster balanced
Koji Noguchi 2011-03-24, 18:12
Just a note
> Usually around the ~80% full mark is when HDFS starts getting a bit wonky > These days, we have large grids over 90% full and still running fine. Percentage of hdfs space could be misleading. We usually monitor the percentage of full datanodes.
Koji
On 3/17/11 2:20 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote:
> > On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote: > >> Parts of this may end up on the hbase list, but I thought I'd start here. My >> basic problem is: >> >> My cluster is getting full enough that having one data node go down does put >> a bit of pressure on the system (when balanced, every DN is more than half >> full). > > Usually around the ~80% full mark is when HDFS starts getting a bit wonky on > super active grids. Your best bet is to either delete some data/store the data > more efficiently, add more nodes, or upgrade the storage capacity of the nodes > you have. The balancer is only going to save you for so long until the whole > thing tips over. > >> Anybody here have any idea how badly running the balancer on a heavily active >> system messes things up? (for hdfs/hbase - if anyone knows). > > I don't run HBase, but at Y! we used to run the balancer pretty much every > day, even on super active grids. It 'mostly works' until you get to the point > of no return, which it sounds like you are heading for... > >> Any ideas? Or do I just need better hardware? Not sure if that's an option, >> though.. > > Depending upon how your systems are configured, something else to look at is > how much space is getting ate by logs, mapreduce spill space, etc. A good > daemon bounce might free up some stale handles as well.
-
Re: keeping an active hdfs cluster balanced
stu24mail@... 2011-03-24, 21:00
Thanks Koji! Is each node a small percentage of the total space in this case?
Take care, -stu -----Original Message----- From: Koji Noguchi <[EMAIL PROTECTED]> Date: Thu, 24 Mar 2011 11:12:00 To: [EMAIL PROTECTED]<[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: keeping an active hdfs cluster balanced
Just a note
> Usually around the ~80% full mark is when HDFS starts getting a bit wonky > These days, we have large grids over 90% full and still running fine. Percentage of hdfs space could be misleading. We usually monitor the percentage of full datanodes.
Koji
On 3/17/11 2:20 PM, "Allen Wittenauer" <[EMAIL PROTECTED]> wrote:
> > On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote: > >> Parts of this may end up on the hbase list, but I thought I'd start here. My >> basic problem is: >> >> My cluster is getting full enough that having one data node go down does put >> a bit of pressure on the system (when balanced, every DN is more than half >> full). > > Usually around the ~80% full mark is when HDFS starts getting a bit wonky on > super active grids. Your best bet is to either delete some data/store the data > more efficiently, add more nodes, or upgrade the storage capacity of the nodes > you have. The balancer is only going to save you for so long until the whole > thing tips over. > >> Anybody here have any idea how badly running the balancer on a heavily active >> system messes things up? (for hdfs/hbase - if anyone knows). > > I don't run HBase, but at Y! we used to run the balancer pretty much every > day, even on super active grids. It 'mostly works' until you get to the point > of no return, which it sounds like you are heading for... > >> Any ideas? Or do I just need better hardware? Not sure if that's an option, >> though.. > > Depending upon how your systems are configured, something else to look at is > how much space is getting ate by logs, mapreduce spill space, etc. A good > daemon bounce might free up some stale handles as well.
|
|