Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Simple stastics per region


Copy link to this message
-
Re: Simple stastics per region
I filed HBASE-7958 <https://issues.apache.org/jira/browse/HBASE-7958> to
follow up on this. Includes a summary of the discussion so far.

-------------------
Jesse Yates
@jesse_yates
jyates.github.com
On Tue, Feb 26, 2013 at 4:31 PM, Jesse Yates <[EMAIL PROTECTED]>wrote:

> The more I think about it, the more I'd like it in core. OSGi is something
> I'd like to avoid as long as we can, and baking this in makes (I think)
> more sense overall. This is especially true for how to deal with displaying
> the histograms in the UI - dependent CPs make me twitch.
>
> The things we would need to make this happen cleanly (IMO) would be:
>
>    - system tables
>       - basically metadata in the table descriptor that would hide it
>       from the usual user queries like list_tables, etc. and expose something
>       like deleteSystemTable
>    - An extra 'stat' scanner that goes on top of the store scanner used
>    for compaction that writes to the stats system table
>       - CPs could still muck with this, but as always, that's at their
>       own peril
>    - Some pretty UI graphs on the master for the stats
>
> The debateable piece is then: pluggable? If so, to what degree?
>
> Something Lars just mentioned which would be nice is to have a Chore-like
> mechanism that lets people easily change the stats they want to keep track
> of. Probably along the lines of dynamic config, but since we can just push
> the changes into a waiting state element/queue-thingy and then let the next
> round of major compaction grab it without race concerns.
>
> Shall I file a JIRA (and sub-jiras) to get this into core; we can also
> take discussion there?
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
>
>
> On Tue, Feb 26, 2013 at 4:27 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
>> Just had a discussion with the Phoenix folks (my cubicle neighbors :) ).
>> Turns out that the types of problem we're trying to solve for Phoenix
>> would need equal-depth histograms, whereas for decisions such as picking a
>> 2ndary index equal-width histograms are often used.
>> So a key in this is a proper framework through, which, stats can hooked
>> up and calculated. OSGi for coprocessors would be nice, but may also be
>> overkill for this.
>> Maybe something like the chores framework would work.
>>
>> In either case, there will be core stats (that would allow HBase to
>> decide between a scan and a multi get), and user defined stats to help
>> higher layers such as Phoenix, or an indexing library.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Enis Söztutar <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>> Sent: Tuesday, February 26, 2013 4:15 PM
>> Subject: Re: Simple stastics per region
>>
>> +1 for core. I can see that histograms might help us in automatic splits
>> and merges as well.
>>
>>
>> On Tue, Feb 26, 2013 at 3:27 PM, Andrew Purtell <[EMAIL PROTECTED]>
>> wrote:
>>
>> > If this is going to be a CP then other CPs need an easy way to use the
>> > output stats. If a subsequent proposal from core requires statistics
>> from
>> > this CP does that then mandate it itself must be a CP? What if that
>> can't
>> > work?
>> >
>> > Putting the stats into a table addresses the first concern.
>> >
>> > For the second, it is an issue that comes up I think when building a
>> > generally useful shared function as a CP. Please consider inserting my
>> > earlier comments about OSGi here, in that we trend toward a real module
>> > system if we're not careful (unless that is the aim).
>> >
>> >
>> > On Tue, Feb 26, 2013 at 2:31 PM, Jesse Yates <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > TL;DR Making it part of the UI and ensuring that you don't load things
>> > the
>> > > wrong way seem to be the only reasons for making this part of core -
>> > > certainly not bad reasons. They are fairly easy to handle as a CP
>> though,
>> > > so maybe its not necessary immediately.
>> > >
>> > > I ended up writing a simple stats framework last week (ok, its like 6