Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: row filter - binary comparator at certain range


Copy link to this message
-
Re: row filter - binary comparator at certain range
James Taylor 2013-10-22, 03:54
One thing I neglected to mention is that the table is pre-split at the
"prepending-row-key-with-single-hashed-byte" boundaries, so the expectation
is that you'd allocate enough buckets that you don't end up needing to
splitting the regions. But if you under allocate (i.e. allocate too small a
SALT_BUCKETS value), then I see your point.

Thanks,
James
On Mon, Oct 21, 2013 at 5:58 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> James,
>
> Its evenly distributed, however... because its a time stamp, its a 'tail
> end charlie' addition.
> So when you split a region, the top half is never added to, so you end up
> with all regions half filled except for the last region in each 'modded'
> value.
>
> I wouldn't say its a bad thing if you plan for it.
>
> On Oct 21, 2013, at 5:07 PM, James Taylor <[EMAIL PROTECTED]> wrote:
>
> > We don't truncate the hash, we mod it. Why would you expect that data
> > wouldn't be evenly distributed? We've not seen this to be the case.
> >
> >
> >
> > On Mon, Oct 21, 2013 at 1:48 PM, Michael Segel <
> [EMAIL PROTECTED]>wrote:
> >
> >> What do you call hashing the row key?
> >> Or hashing the row key and then appending the row key to the hash?
> >> Or hashing the row key, truncating the hash value to some subset and
> then
> >> appending the row key to the value?
> >>
> >> The problem is that there is specific meaning to the term salt. Re-using
> >> it here will cause confusion because you're implying something you don't
> >> mean to imply.
> >>
> >> you could say prepend a truncated hash of the key, however… is prepend a
> >> real word? ;-) (I am sorry, I am not a grammar nazi, nor an English
> major. )
> >>
> >> So even outside of Phoenix, the concept is the same.
> >> Even with a truncated hash, you will find that over time, all but the
> tail
> >> N regions will only be half full.
> >> This could be both good and bad.
> >>
> >> (Where N is your number 8 or 16 allowable hash values.)
> >>
> >> You've solved potentially one problem… but still have other issues that
> >> you need to address.
> >> I guess the simple answer is to double the region sizes and not care
> that
> >> most of your regions will be 1/2 the max size…  but the size you really
> >> want and 8-16 regions will be up to twice as big.
> >>
> >>
> >>
> >> On Oct 21, 2013, at 3:26 PM, James Taylor <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> What do you think it should be called, because
> >>> "prepending-row-key-with-single-hashed-byte" doesn't have a very good
> >> ring
> >>> to it. :-)
> >>>
> >>> Agree that getting the row key design right is crucial.
> >>>
> >>> The range of "prepending-row-key-with-single-hashed-byte" is
> declarative
> >>> when you create your table in Phoenix, so you typically declare an
> upper
> >>> bound based on your cluster size (not 255, but maybe 8 or 16). We've
> run
> >>> the numbers and it's typically faster, but as with most things, not
> >> always.
> >>>
> >>> HTH,
> >>> James
> >>>
> >>>
> >>> On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel <
> >> [EMAIL PROTECTED]>wrote:
> >>>
> >>>> Then its not a SALT. And please don't use the term 'salt' because it
> has
> >>>> specific meaning outside to what you want it to mean.  Just like
> saying
> >>>> HBase has ACID because you write the entire row as an atomic element.
> >> But
> >>>> I digress….
> >>>>
> >>>> Ok so to your point…
> >>>>
> >>>> 1 byte == 255 possible values.
> >>>>
> >>>> So which will be faster.
> >>>>
> >>>> creating a list of the 1 byte truncated hash of each possible
> timestamp
> >> in
> >>>> your range, or doing 255 separate range scans with the start and stop
> >> range
> >>>> key set?
> >>>>
> >>>> That will give you the results you want, however… I'd go back and have
> >>>> them possibly rethink the row key if they can … assuming this is the
> >> base
> >>>> access pattern.
> >>>>
> >>>> HTH
> >>>>
> >>>> -Mike
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Oct 21, 2013, at 11:37 AM, James Taylor <[EMAIL PROTECTED]>