|
Bryan Beaudreault
2012-12-07, 21:01
Marcos Ortiz
2012-12-07, 21:16
Stack
2012-12-07, 21:48
Bryan Beaudreault
2012-12-08, 16:50
Marcos Ortiz
2012-12-08, 20:42
Bryan Beaudreault
2012-12-08, 21:08
|
-
Bulk loading (and/or major compaction) causing OOMBryan Beaudreault 2012-12-07, 21:01
We have a couple tables that had thousands of regions due to the size of
the day in them. We recently changed them to have larger regions (nearly 4GB). We are trying to bulk load these in now, but every time we do our servers die with OOM. The logs seem to show that there is always a major compaction happening when the OOM happens. This is among other normal usage from a variety of apps in our product, so the memstores, block cache, etc are all active during this time. I was reading through the compaction code and it doesn't look like it should take up much memory (depending on how the Reader class works) . Does anyone with more knowledge of these internals know how it bulk load and major compaction works with regard to memory? We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase version 0.90.4 (I know, I know, we're working to upgrade). Thanks.
-
Re: Bulk loading (and/or major compaction) causing OOMMarcos Ortiz 2012-12-07, 21:16
On 12/07/2012 04:01 PM, Bryan Beaudreault wrote: > We have a couple tables that had thousands of regions due to the size of > the day in them. We recently changed them to have larger regions (nearly > 4GB). We are trying to bulk load these in now, but every time we do our > servers die with OOM. When you are doing the bulk load, are you pre-split your regions? What OS are you using and what version of Java? > > The logs seem to show that there is always a major compaction happening > when the OOM happens. This is among other normal usage from a variety of > apps in our product, so the memstores, block cache, etc are all active > during this time. There are a good number of improvements in the new releases respect to compactions. > > I was reading through the compaction code and it doesn't look like it > should take up much memory (depending on how the Reader class works) . > Does anyone with more knowledge of these internals know how it bulk load > and major compaction works with regard to memory? > > We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase > version 0.90.4 (I know, I know, we're working to upgrade). Yes, my friend. You should know all the benefits in the new stable release (0.94.3), so this is the first advice. > > Thanks. > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
-
Re: Bulk loading (and/or major compaction) causing OOMStack 2012-12-07, 21:48
On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault
<[EMAIL PROTECTED]>wrote: > We have a couple tables that had thousands of regions due to the size of > the day in them. We recently changed them to have larger regions (nearly > 4GB). We are trying to bulk load these in now, but every time we do our > servers die with OOM. > > You mean, you are reloading the data that once was in thousands of regions instead into new regions of 4GB in size? I'd be surprised if the actual bulk load brings on the OOME. > The logs seem to show that there is always a major compaction happening > when the OOM happens. This is among other normal usage from a variety of > apps in our product, so the memstores, block cache, etc are all active > during this time. > > Could you turn off major compaction during the bulk load to see if that helps? > I was reading through the compaction code and it doesn't look like it > should take up much memory (depending on how the Reader class works) . > Yes. Are there lots of storefiles under each region? > Does anyone with more knowledge of these internals know how it bulk load > and major compaction works with regard to memory? > > We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase > version 0.90.4 (I know, I know, we're working to upgrade). > How much have you given hbase? If you look at your cluster monitoring, are you swapping? The regionservers are carrying how many regions per server? St.Ack
-
Re: Bulk loading (and/or major compaction) causing OOMBryan Beaudreault 2012-12-08, 16:50
Thanks for the responses guys. Responses inline
> When you are doing the bulk load, are you pre-split your regions? > What OS are you using and what version of Java? Yes, regions are pre-split. We calculated them using M/R before attempting to bulk load the data. We've done this before with smaller sizes and it has worked fine. Centos5, java 1.6.0_27 > Yes, my friend. You should know all the benefits in the new stable release (0.94.3), so > this is the first advice. We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x branch. On Fri, Dec 7, 2012 at 4:48 PM, Stack <[EMAIL PROTECTED]> wrote: > On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault > <[EMAIL PROTECTED]>wrote: > > > We have a couple tables that had thousands of regions due to the size of > > the day in them. We recently changed them to have larger regions (nearly > > 4GB). We are trying to bulk load these in now, but every time we do our > > servers die with OOM. > > > > > You mean, you are reloading the data that once was in thousands of regions > instead into new regions of 4GB in size? > > I'd be surprised if the actual bulk load brings on the OOME. > > That's correct. The exact same data is currently live in an older table with thousands of smaller regions. Once we get these loaded we will swap in the new table and delete the old. > > > > The logs seem to show that there is always a major compaction happening > > when the OOM happens. This is among other normal usage from a variety of > > apps in our product, so the memstores, block cache, etc are all active > > during this time. > > > > > > Could you turn off major compaction during the bulk load to see if that > helps? > > Automatic major compactions are actually off for our cluster, it looks like they start doing minor compactions as data is loaded in, and that is where we first saw the OOM issues. So we tried forcing major compactions earlier instead. > > > > I was reading through the compaction code and it doesn't look like it > > should take up much memory (depending on how the Reader class works) . > > > > > Yes. > > Are there lots of storefiles under each region? > > Yes actually, the bulk loaded data usually seems to contain approximately 5-10 files per region. Likely due to the output settings of the M/R job that creates this data. > > > > Does anyone with more knowledge of these internals know how it bulk load > > and major compaction works with regard to memory? > > > > We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase > > version 0.90.4 (I know, I know, we're working to upgrade). > > > > How much have you given hbase? > > If you look at your cluster monitoring, are you swapping? > > The regionservers are carrying how many regions per server? > The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of which 1GB goes to DN and rest to OS) Swapping is disabled. We have around 350 regions per RS currently. What we're doing now with this table is part of our effort to decrease the number of regions across all tables. We need to do it with minimal downtime though so it is slow going. We are aiming for around 200 regions per RS. > > St.Ack >
-
Re: Bulk loading (and/or major compaction) causing OOMMarcos Ortiz 2012-12-08, 20:42
On 12/08/2012 11:50 AM, Bryan Beaudreault wrote: > Thanks for the responses guys. Responses inline > >> When you are doing the bulk load, are you pre-split your regions? >> What OS are you using and what version of Java? > Yes, regions are pre-split. We calculated them using M/R before attempting > to bulk load the data. We've done this before with smaller sizes and it > has worked fine. > > Centos5, java 1.6.0_27 > >> Yes, my friend. You should know all the benefits in the new stable > release (0.94.3), so >> this is the first advice. > We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x > branch. Great to hear. > > On Fri, Dec 7, 2012 at 4:48 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault >> <[EMAIL PROTECTED]>wrote: >> >>> We have a couple tables that had thousands of regions due to the size of >>> the day in them. We recently changed them to have larger regions (nearly >>> 4GB). We are trying to bulk load these in now, but every time we do our >>> servers die with OOM. >>> >>> >> You mean, you are reloading the data that once was in thousands of regions >> instead into new regions of 4GB in size? >> >> I'd be surprised if the actual bulk load brings on the OOME. >> >> > That's correct. The exact same data is currently live in an older table > with thousands of smaller regions. Once we get these loaded we will swap > in the new table and delete the old. > > >> >>> The logs seem to show that there is always a major compaction happening >>> when the OOM happens. This is among other normal usage from a variety of >>> apps in our product, so the memstores, block cache, etc are all active >>> during this time. >>> >>> >> Could you turn off major compaction during the bulk load to see if that >> helps? >> >> Automatic major compactions are actually off for our cluster, it looks > like they start doing minor compactions as data is loaded in, and that is > where we first saw the OOM issues. So we tried forcing major compactions > earlier instead. > >> >>> I was reading through the compaction code and it doesn't look like it >>> should take up much memory (depending on how the Reader class works) . >>> >> >> Yes. >> >> Are there lots of storefiles under each region? >> >> Yes actually, the bulk loaded data usually seems to contain approximately > 5-10 files per region. Likely due to the output settings of the M/R job > that creates this data. > > >> >>> Does anyone with more knowledge of these internals know how it bulk load >>> and major compaction works with regard to memory? >>> >>> We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase >>> version 0.90.4 (I know, I know, we're working to upgrade). >>> >> How much have you given hbase? >> >> If you look at your cluster monitoring, are you swapping? >> >> The regionservers are carrying how many regions per server? >> > The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of > which 1GB goes to DN and rest to OS) > Swapping is disabled. > We have around 350 regions per RS currently. What we're doing now with this > table is part of our effort to decrease the number of regions across all > tables. We need to do it with minimal downtime though so it is slow going. > We are aiming for around 200 regions per RS. Yes, It would be nice to see less regions by servers. Have you considered to merge some adjacent regions? > >> St.Ack >> > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ort�z Valmaseda about.me/marcosortiz <http://about.me/marcosortiz> @marcosluis2186 <http://twitter.com/marcosluis2186> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
-
Re: Bulk loading (and/or major compaction) causing OOMBryan Beaudreault 2012-12-08, 21:08
Merging is not an option for us, because we cannot afford to bring our
cluster down. Also, we are not yet convinced that our cluster can handle such large regions due to all the OOM issues we are seeing when trying to bring new, bigger regions online. On Sat, Dec 8, 2012 at 3:42 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote: > > On 12/08/2012 11:50 AM, Bryan Beaudreault wrote: > > Thanks for the responses guys. Responses inline > > > When you are doing the bulk load, are you pre-split your regions? > What OS are you using and what version of Java? > > Yes, regions are pre-split. We calculated them using M/R before attempting > to bulk load the data. We've done this before with smaller sizes and it > has worked fine. > > Centos5, java 1.6.0_27 > > > Yes, my friend. You should know all the benefits in the new stable > > release (0.94.3), so > > this is the first advice. > > We use CDH currently, so we are working to move to cdh4.1.2, which is 92.x > branch. > > Great to hear. > > > On Fri, Dec 7, 2012 at 4:48 PM, Stack <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: > > > On Fri, Dec 7, 2012 at 1:01 PM, Bryan Beaudreault<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>wrote: > > > We have a couple tables that had thousands of regions due to the size of > the day in them. We recently changed them to have larger regions (nearly > 4GB). We are trying to bulk load these in now, but every time we do our > servers die with OOM. > > > > You mean, you are reloading the data that once was in thousands of regions > instead into new regions of 4GB in size? > > I'd be surprised if the actual bulk load brings on the OOME. > > > > That's correct. The exact same data is currently live in an older table > with thousands of smaller regions. Once we get these loaded we will swap > in the new table and delete the old. > > > > The logs seem to show that there is always a major compaction happening > when the OOM happens. This is among other normal usage from a variety of > apps in our product, so the memstores, block cache, etc are all active > during this time. > > > > Could you turn off major compaction during the bulk load to see if that > helps? > > Automatic major compactions are actually off for our cluster, it looks > > like they start doing minor compactions as data is loaded in, and that is > where we first saw the OOM issues. So we tried forcing major compactions > earlier instead. > > > I was reading through the compaction code and it doesn't look like it > should take up much memory (depending on how the Reader class works) . > > > > Yes. > > Are there lots of storefiles under each region? > > Yes actually, the bulk loaded data usually seems to contain approximately > > 5-10 files per region. Likely due to the output settings of the M/R job > that creates this data. > > > > Does anyone with more knowledge of these internals know how it bulk load > and major compaction works with regard to memory? > > We are running on ec2 c1.xlarge servers with 5GB of heap, and on hbase > version 0.90.4 (I know, I know, we're working to upgrade). > > > How much have you given hbase? > > If you look at your cluster monitoring, are you swapping? > > The regionservers are carrying how many regions per server? > > > The RegionServers have 5GB of heap (7.5GB total memory on a c1.xlarge, of > which 1GB goes to DN and rest to OS) > Swapping is disabled. > We have around 350 regions per RS currently. What we're doing now with this > table is part of our effort to decrease the number of regions across all > tables. We need to do it with minimal downtime though so it is slow going. > We are aiming for around 200 regions per RS. > > Yes, It would be nice to see less regions by servers. Have you considered > to merge some adjacent > regions? > > St.Ack > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > http://www.uci.cuhttp://www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci |