Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: row filter - binary comparator at certain range


Copy link to this message
-
Re: row filter - binary comparator at certain range
One thing I neglected to mention is that the table is pre-split at the
"prepending-row-key-with-single-hashed-byte" boundaries, so the expectation
is that you'd allocate enough buckets that you don't end up needing to
splitting the regions. But if you under allocate (i.e. allocate too small a
SALT_BUCKETS value), then I see your point.

Thanks,
James
On Mon, Oct 21, 2013 at 5:58 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> James,
>
> Its evenly distributed, however... because its a time stamp, its a 'tail
> end charlie' addition.
> So when you split a region, the top half is never added to, so you end up
> with all regions half filled except for the last region in each 'modded'
> value.
>
> I wouldn't say its a bad thing if you plan for it.
>
> On Oct 21, 2013, at 5:07 PM, James Taylor <[EMAIL PROTECTED]> wrote:
>
> > We don't truncate the hash, we mod it. Why would you expect that data
> > wouldn't be evenly distributed? We've not seen this to be the case.
> >
> >
> >
> > On Mon, Oct 21, 2013 at 1:48 PM, Michael Segel <
> [EMAIL PROTECTED]>wrote:
> >
> >> What do you call hashing the row key?
> >> Or hashing the row key and then appending the row key to the hash?
> >> Or hashing the row key, truncating the hash value to some subset and
> then
> >> appending the row key to the value?
> >>
> >> The problem is that there is specific meaning to the term salt. Re-using
> >> it here will cause confusion because you're implying something you don't
> >> mean to imply.
> >>
> >> you could say prepend a truncated hash of the key, however… is prepend a
> >> real word? ;-) (I am sorry, I am not a grammar nazi, nor an English
> major. )
> >>
> >> So even outside of Phoenix, the concept is the same.
> >> Even with a truncated hash, you will find that over time, all but the
> tail
> >> N regions will only be half full.
> >> This could be both good and bad.
> >>
> >> (Where N is your number 8 or 16 allowable hash values.)
> >>
> >> You've solved potentially one problem… but still have other issues that
> >> you need to address.
> >> I guess the simple answer is to double the region sizes and not care
> that
> >> most of your regions will be 1/2 the max size…  but the size you really
> >> want and 8-16 regions will be up to twice as big.
> >>
> >>
> >>
> >> On Oct 21, 2013, at 3:26 PM, James Taylor <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> What do you think it should be called, because
> >>> "prepending-row-key-with-single-hashed-byte" doesn't have a very good
> >> ring
> >>> to it. :-)
> >>>
> >>> Agree that getting the row key design right is crucial.
> >>>
> >>> The range of "prepending-row-key-with-single-hashed-byte" is
> declarative
> >>> when you create your table in Phoenix, so you typically declare an
> upper
> >>> bound based on your cluster size (not 255, but maybe 8 or 16). We've
> run
> >>> the numbers and it's typically faster, but as with most things, not
> >> always.
> >>>
> >>> HTH,
> >>> James
> >>>
> >>>
> >>> On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel <
> >> [EMAIL PROTECTED]>wrote:
> >>>
> >>>> Then its not a SALT. And please don't use the term 'salt' because it
> has
> >>>> specific meaning outside to what you want it to mean.  Just like
> saying
> >>>> HBase has ACID because you write the entire row as an atomic element.
> >> But
> >>>> I digress….
> >>>>
> >>>> Ok so to your point…
> >>>>
> >>>> 1 byte == 255 possible values.
> >>>>
> >>>> So which will be faster.
> >>>>
> >>>> creating a list of the 1 byte truncated hash of each possible
> timestamp
> >> in
> >>>> your range, or doing 255 separate range scans with the start and stop
> >> range
> >>>> key set?
> >>>>
> >>>> That will give you the results you want, however… I'd go back and have
> >>>> them possibly rethink the row key if they can … assuming this is the
> >> base
> >>>> access pattern.
> >>>>
> >>>> HTH
> >>>>
> >>>> -Mike
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Oct 21, 2013, at 11:37 AM, James Taylor <[EMAIL PROTECTED]>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB