Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Maximum number of tables ?


Copy link to this message
-
Re: Maximum number of tables ?
Thanks for these answers ; it was a theoretical question. Actually, a
common pattern in other solutions for batch deletion is to organize data in
- for instance - one table per day and remove the eldest day after day.
That way is more efficient than finding old rows, then delete them (due to
lock strategy, fragmentation, blocking compaction, etc.). Not sure it's
relevant for HBase!

On Fri, Jul 13, 2012 at 7:47 PM, Lars George <[EMAIL PROTECTED]> wrote:

> It is basically unset:
>
>     this.regionSplitLimit > conf.getInt("hbase.regionserver.regionSplitLimit",
>         Integer.MAX_VALUE);
>
> (from CompactSplitThread.java).
>
> The number of regions is OK until you dilute the available heap share too
> much. So you can have >1000 regions (given the block index, file handles
> etc. keep up) but only a few them can be active most of the time.
>
> Lars
>
> On Jul 13, 2012, at 7:40 PM, Michael Segel wrote:
>
> > I'm going from memory. There was a hardcoded number. I'd have to go back
> and try to find it.
> >
> > From a practical standpoint, going over 1000 regions per RS will put you
> on thin ice.
> >
> > Too many regions can kill your system.
> >
> > On Jul 13, 2012, at 12:36 PM, Kevin O'dell wrote:
> >
> >> Mike,
> >>
> >> I just saw a system with 2500 Regions per RS(crazy I know, we are fixing
> >> that).  I did not think there was a hard coded limit...
> >>
> >> On Fri, Jul 13, 2012 at 11:50 AM, Amandeep Khurana <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> I have come across clusters with 100s of tables but that typically is
> >>> due to a sub optimal table design.
> >>>
> >>> The question here is - why do you need to distribute your data over
> >>> lots of tables? What's your access pattern and what kind of data are
> >>> you putting in? Or is this just a theoretical question?
> >>>
> >>> On Jul 13, 2012, at 12:05 AM, Adrien Mogenet <[EMAIL PROTECTED]
> >
> >>> wrote:
> >>>
> >>>> Hi there,
> >>>>
> >>>> I read some good practices about number of columns / column families,
> but
> >>>> nothing about the number of tables.
> >>>> What if I need to spread my data among hundred or thousand (big)
> tables ?
> >>>> What should I care about ? I guess I should keep a tight number of
> >>>> storeFiles per RegionServer ?
> >>>>
> >>>> --
> >>>> Adrien Mogenet
> >>>> http://www.mogenet.me
> >>>
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Customer Operations Engineer, Cloudera
> >
>
>
--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB