Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Simple stastics per region


Copy link to this message
-
Re: Simple stastics per region
Jesse Yates 2013-02-27, 00:31
The more I think about it, the more I'd like it in core. OSGi is something
I'd like to avoid as long as we can, and baking this in makes (I think)
more sense overall. This is especially true for how to deal with displaying
the histograms in the UI - dependent CPs make me twitch.

The things we would need to make this happen cleanly (IMO) would be:

   - system tables
      - basically metadata in the table descriptor that would hide it from
      the usual user queries like list_tables, etc. and expose something like
      deleteSystemTable
   - An extra 'stat' scanner that goes on top of the store scanner used for
   compaction that writes to the stats system table
      - CPs could still muck with this, but as always, that's at their own
      peril
   - Some pretty UI graphs on the master for the stats

The debateable piece is then: pluggable? If so, to what degree?

Something Lars just mentioned which would be nice is to have a Chore-like
mechanism that lets people easily change the stats they want to keep track
of. Probably along the lines of dynamic config, but since we can just push
the changes into a waiting state element/queue-thingy and then let the next
round of major compaction grab it without race concerns.

Shall I file a JIRA (and sub-jiras) to get this into core; we can also take
discussion there?
-------------------
Jesse Yates
@jesse_yates
jyates.github.com
On Tue, Feb 26, 2013 at 4:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Just had a discussion with the Phoenix folks (my cubicle neighbors :) ).
> Turns out that the types of problem we're trying to solve for Phoenix
> would need equal-depth histograms, whereas for decisions such as picking a
> 2ndary index equal-width histograms are often used.
> So a key in this is a proper framework through, which, stats can hooked up
> and calculated. OSGi for coprocessors would be nice, but may also be
> overkill for this.
> Maybe something like the chores framework would work.
>
> In either case, there will be core stats (that would allow HBase to decide
> between a scan and a multi get), and user defined stats to help higher
> layers such as Phoenix, or an indexing library.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Enis Söztutar <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Sent: Tuesday, February 26, 2013 4:15 PM
> Subject: Re: Simple stastics per region
>
> +1 for core. I can see that histograms might help us in automatic splits
> and merges as well.
>
>
> On Tue, Feb 26, 2013 at 3:27 PM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
>
> > If this is going to be a CP then other CPs need an easy way to use the
> > output stats. If a subsequent proposal from core requires statistics from
> > this CP does that then mandate it itself must be a CP? What if that can't
> > work?
> >
> > Putting the stats into a table addresses the first concern.
> >
> > For the second, it is an issue that comes up I think when building a
> > generally useful shared function as a CP. Please consider inserting my
> > earlier comments about OSGi here, in that we trend toward a real module
> > system if we're not careful (unless that is the aim).
> >
> >
> > On Tue, Feb 26, 2013 at 2:31 PM, Jesse Yates <[EMAIL PROTECTED]
> > >wrote:
> >
> > > TL;DR Making it part of the UI and ensuring that you don't load things
> > the
> > > wrong way seem to be the only reasons for making this part of core -
> > > certainly not bad reasons. They are fairly easy to handle as a CP
> though,
> > > so maybe its not necessary immediately.
> > >
> > > I ended up writing a simple stats framework last week (ok, its like 6
> > > classes) that makes it easy to create your own stats for a table. Its
> all
> > > coprocessor based, and as Lars suggested, hooks up to the major
> > compactions
> > > to let you build per-column-per-region stats and writes it to a
> 'system'
> > > table = "_stats_".
> > >
> > > With the framework you could easily write your own custom stats, from