Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Simple stastics per region


+
lars hofhansl 2013-02-23, 06:40
+
Andrew Purtell 2013-02-23, 17:41
+
lars hofhansl 2013-02-23, 20:39
+
Andrew Purtell 2013-02-23, 17:18
+
Stack 2013-02-26, 22:08
+
Jesse Yates 2013-02-26, 22:31
Copy link to this message
-
Re: Simple stastics per region
If this is going to be a CP then other CPs need an easy way to use the
output stats. If a subsequent proposal from core requires statistics from
this CP does that then mandate it itself must be a CP? What if that can't
work?

Putting the stats into a table addresses the first concern.

For the second, it is an issue that comes up I think when building a
generally useful shared function as a CP. Please consider inserting my
earlier comments about OSGi here, in that we trend toward a real module
system if we're not careful (unless that is the aim).
On Tue, Feb 26, 2013 at 2:31 PM, Jesse Yates <[EMAIL PROTECTED]>wrote:

> TL;DR Making it part of the UI and ensuring that you don't load things the
> wrong way seem to be the only reasons for making this part of core -
> certainly not bad reasons. They are fairly easy to handle as a CP though,
> so maybe its not necessary immediately.
>
> I ended up writing a simple stats framework last week (ok, its like 6
> classes) that makes it easy to create your own stats for a table. Its all
> coprocessor based, and as Lars suggested, hooks up to the major compactions
> to let you build per-column-per-region stats and writes it to a 'system'
> table = "_stats_".
>
> With the framework you could easily write your own custom stats, from
> simple things like min/max keys to things like fixed width or fixed depth
> histograms, or even more complicated. There has been some internal
> discussion around how to make this available to the community (as part of
> Phoenix, core in HBase, an independent github project, ...?).
>
> The biggest isssue around having it all CP based is that you need to be
> really careful to ensure that it comes _after_ all the other compaction
> coprocessors. This way you know exactly what keys come out and have correct
> statistics (for that point in time). Not a huge issue - you just need to be
> careful. Baking the stats framework into HBase is really nice in that we
> can be sure we never mess this up.
>
> Building it into the core of HBase isn't going to get us per-region
> statistics without a whole bunch of pain - compactions per store make this
> a pain to actualize; there isn't a real advantage here, as I'd like to keep
> it per CF, if only not to change all the things.
>
> Further, this would be a great first use-case for real system tables.
> Mixing this data with .META. is going to be a bit of a mess, especially for
> doing clean scans, etc. to read the stats. Also, I'd be gravely concerned
> to muck with such important state, especially if we make a 'statistic' a
> pluggable element (so people can easily expand their own).
>
> And sure, we could make it make pretty graphs on the UI, no harm in it and
> very little overhead :)
>
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
>
>
> On Tue, Feb 26, 2013 at 2:08 PM, Stack <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Feb 22, 2013 at 10:40 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >
> > > This topic comes up now and then (see recent discussion about
> translating
> > > multi Gets into Scan+Filter).
> > >
> > > It's not that hard to keep statistics as part of compactions.
> > > I envision two knobs:
> > > 1. Max number of distinct values to track directly. If a column has
> less
> > > this # of values, keep track of their occurrences explicitly.
> > > 2. Number of (equal width) histogram partitions to maintain.
> > >
> > > Statistics would be kept per store (i.e. per region per column family)
> > and
> > > stored into an HBase table (one row per store).Initially we could just
> > > support major compactions that atomically insert a new version of that
> > > statistics for the store.
> > >
> > >
> > Sounds great.
> >
> > In .META. add columns for each each cf on each region row?  Or another
> > table?
> >
> > What kind of stats would you keep?  Would they be useful for operators?
>  Or
> > just for stuff like say Phoenix making decisions?
> >
> >
> >
> > > An simple implementation (not knowing ahead of time how many values it

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
Enis Söztutar 2013-02-27, 00:15
+
lars hofhansl 2013-02-27, 00:27
+
Jesse Yates 2013-02-27, 00:31
+
Jesse Yates 2013-02-28, 01:52
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB