|
Adrien Mogenet
2012-07-13, 07:04
N Keywal
2012-07-13, 08:14
Michael Segel
2012-07-13, 15:42
Amandeep Khurana
2012-07-13, 15:50
Kevin O'dell
2012-07-13, 17:36
Michael Segel
2012-07-13, 17:40
Lars George
2012-07-13, 17:47
Adrien Mogenet
2012-07-14, 06:27
|
-
Maximum number of tables ?Adrien Mogenet 2012-07-13, 07:04
Hi there,
I read some good practices about number of columns / column families, but nothing about the number of tables. What if I need to spread my data among hundred or thousand (big) tables ? What should I care about ? I guess I should keep a tight number of storeFiles per RegionServer ? -- Adrien Mogenet http://www.mogenet.me
-
Re: Maximum number of tables ?N Keywal 2012-07-13, 08:14
Hi,
There is no real limits as far as I know. As you will have one region per table (at least :-), the number of region will be something to monitor carefully if you need thousands of table. See http://hbase.apache.org/book.html#arch.regions.size. Don't forget that you can add as many column as you want, and that an empty cell cost nothing. For example, a class hierarchy is often mapped to multiple tables in a RDBMS, while in HBase having a single table for the same hierarchy makes much more sense. Moreover, there is no transaction between tables, so sometimes a 'uml composition' will go to a single table. And so on. N. On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet <[EMAIL PROTECTED]> wrote: > Hi there, > > I read some good practices about number of columns / column families, but > nothing about the number of tables. > What if I need to spread my data among hundred or thousand (big) tables ? > What should I care about ? I guess I should keep a tight number of > storeFiles per RegionServer ? > > -- > Adrien Mogenet > http://www.mogenet.me
-
Re: Maximum number of tables ?Michael Segel 2012-07-13, 15:42
Currently there is a hardcoded limit on the number of regions that a region server can manage.
Its 1500. Note that if the number of regions gets to around 1000 regions per region server, you end up with a performance hit. (YMMV) So if you have 1 region per table, there's a real limit of 1500 tables * number of RS nodes. Note: You will probably die well before hitting this limit, again YMMV. On Jul 13, 2012, at 3:14 AM, N Keywal wrote: > Hi, > > There is no real limits as far as I know. As you will have one region > per table (at least :-), the number of region will be something to > monitor carefully if you need thousands of table. See > http://hbase.apache.org/book.html#arch.regions.size. > > Don't forget that you can add as many column as you want, and that an > empty cell cost nothing. For example, a class hierarchy is often > mapped to multiple tables in a RDBMS, while in HBase having a single > table for the same hierarchy makes much more sense. Moreover, there is > no transaction between tables, so sometimes a 'uml composition' will > go to a single table. And so on. > > N. > > On Fri, Jul 13, 2012 at 9:04 AM, Adrien Mogenet > <[EMAIL PROTECTED]> wrote: >> Hi there, >> >> I read some good practices about number of columns / column families, but >> nothing about the number of tables. >> What if I need to spread my data among hundred or thousand (big) tables ? >> What should I care about ? I guess I should keep a tight number of >> storeFiles per RegionServer ? >> >> -- >> Adrien Mogenet >> http://www.mogenet.me >
-
Re: Maximum number of tables ?Amandeep Khurana 2012-07-13, 15:50
I have come across clusters with 100s of tables but that typically is
due to a sub optimal table design. The question here is - why do you need to distribute your data over lots of tables? What's your access pattern and what kind of data are you putting in? Or is this just a theoretical question? On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <[EMAIL PROTECTED]> wrote: > Hi there, > > I read some good practices about number of columns / column families, but > nothing about the number of tables. > What if I need to spread my data among hundred or thousand (big) tables ? > What should I care about ? I guess I should keep a tight number of > storeFiles per RegionServer ? > > -- > Adrien Mogenet > http://www.mogenet.me
-
Re: Maximum number of tables ?Kevin O'dell 2012-07-13, 17:36
Mike,
I just saw a system with 2500 Regions per RS(crazy I know, we are fixing that). I did not think there was a hard coded limit... On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > I have come across clusters with 100s of tables but that typically is > due to a sub optimal table design. > > The question here is - why do you need to distribute your data over > lots of tables? What's your access pattern and what kind of data are > you putting in? Or is this just a theoretical question? > > On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <[EMAIL PROTECTED]> > wrote: > > > Hi there, > > > > I read some good practices about number of columns / column families, but > > nothing about the number of tables. > > What if I need to spread my data among hundred or thousand (big) tables ? > > What should I care about ? I guess I should keep a tight number of > > storeFiles per RegionServer ? > > > > -- > > Adrien Mogenet > > http://www.mogenet.me > -- Kevin O'Dell Customer Operations Engineer, Cloudera
-
Re: Maximum number of tables ?Michael Segel 2012-07-13, 17:40
I'm going from memory. There was a hardcoded number. I'd have to go back and try to find it.
From a practical standpoint, going over 1000 regions per RS will put you on thin ice. Too many regions can kill your system. On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote: > Mike, > > I just saw a system with 2500 Regions per RS(crazy I know, we are fixing > that). I did not think there was a hard coded limit... > > On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > >> I have come across clusters with 100s of tables but that typically is >> due to a sub optimal table design. >> >> The question here is - why do you need to distribute your data over >> lots of tables? What's your access pattern and what kind of data are >> you putting in? Or is this just a theoretical question? >> >> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <[EMAIL PROTECTED]> >> wrote: >> >>> Hi there, >>> >>> I read some good practices about number of columns / column families, but >>> nothing about the number of tables. >>> What if I need to spread my data among hundred or thousand (big) tables ? >>> What should I care about ? I guess I should keep a tight number of >>> storeFiles per RegionServer ? >>> >>> -- >>> Adrien Mogenet >>> http://www.mogenet.me >> > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera
-
Re: Maximum number of tables ?Lars George 2012-07-13, 17:47
It is basically unset:
this.regionSplitLimit = conf.getInt("hbase.regionserver.regionSplitLimit", Integer.MAX_VALUE); (from CompactSplitThread.java). The number of regions is OK until you dilute the available heap share too much. So you can have >1000 regions (given the block index, file handles etc. keep up) but only a few them can be active most of the time. Lars On Jul 13, 2012, at 7:40 PM, Michael Segel wrote: > I'm going from memory. There was a hardcoded number. I'd have to go back and try to find it. > > From a practical standpoint, going over 1000 regions per RS will put you on thin ice. > > Too many regions can kill your system. > > On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote: > >> Mike, >> >> I just saw a system with 2500 Regions per RS(crazy I know, we are fixing >> that). I did not think there was a hard coded limit... >> >> On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >> >>> I have come across clusters with 100s of tables but that typically is >>> due to a sub optimal table design. >>> >>> The question here is - why do you need to distribute your data over >>> lots of tables? What's your access pattern and what kind of data are >>> you putting in? Or is this just a theoretical question? >>> >>> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <[EMAIL PROTECTED]> >>> wrote: >>> >>>> Hi there, >>>> >>>> I read some good practices about number of columns / column families, but >>>> nothing about the number of tables. >>>> What if I need to spread my data among hundred or thousand (big) tables ? >>>> What should I care about ? I guess I should keep a tight number of >>>> storeFiles per RegionServer ? >>>> >>>> -- >>>> Adrien Mogenet >>>> http://www.mogenet.me >>> >> >> >> >> -- >> Kevin O'Dell >> Customer Operations Engineer, Cloudera >
-
Re: Maximum number of tables ?Adrien Mogenet 2012-07-14, 06:27
Thanks for these answers ; it was a theoretical question. Actually, a
common pattern in other solutions for batch deletion is to organize data in - for instance - one table per day and remove the eldest day after day. That way is more efficient than finding old rows, then delete them (due to lock strategy, fragmentation, blocking compaction, etc.). Not sure it's relevant for HBase! On Fri, Jul 13, 2012 at 7:47 PM, Lars George <[EMAIL PROTECTED]> wrote: > It is basically unset: > > this.regionSplitLimit > conf.getInt("hbase.regionserver.regionSplitLimit", > Integer.MAX_VALUE); > > (from CompactSplitThread.java). > > The number of regions is OK until you dilute the available heap share too > much. So you can have >1000 regions (given the block index, file handles > etc. keep up) but only a few them can be active most of the time. > > Lars > > On Jul 13, 2012, at 7:40 PM, Michael Segel wrote: > > > I'm going from memory. There was a hardcoded number. I'd have to go back > and try to find it. > > > > From a practical standpoint, going over 1000 regions per RS will put you > on thin ice. > > > > Too many regions can kill your system. > > > > On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote: > > > >> Mike, > >> > >> I just saw a system with 2500 Regions per RS(crazy I know, we are fixing > >> that). I did not think there was a hard coded limit... > >> > >> On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <[EMAIL PROTECTED]> > wrote: > >> > >>> I have come across clusters with 100s of tables but that typically is > >>> due to a sub optimal table design. > >>> > >>> The question here is - why do you need to distribute your data over > >>> lots of tables? What's your access pattern and what kind of data are > >>> you putting in? Or is this just a theoretical question? > >>> > >>> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <[EMAIL PROTECTED] > > > >>> wrote: > >>> > >>>> Hi there, > >>>> > >>>> I read some good practices about number of columns / column families, > but > >>>> nothing about the number of tables. > >>>> What if I need to spread my data among hundred or thousand (big) > tables ? > >>>> What should I care about ? I guess I should keep a tight number of > >>>> storeFiles per RegionServer ? > >>>> > >>>> -- > >>>> Adrien Mogenet > >>>> http://www.mogenet.me > >>> > >> > >> > >> > >> -- > >> Kevin O'Dell > >> Customer Operations Engineer, Cloudera > > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me |