Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Re: row filter - binary comparator at certain range


+
Tony Duan 2013-10-21, 04:31
+
Michael Segel 2013-10-21, 11:36
+
Michael Segel 2013-10-21, 11:38
+
Premal Shah 2013-10-21, 05:42
+
James Taylor 2013-10-21, 06:05
+
Vladimir Rodionov 2013-10-21, 16:14
+
James Taylor 2013-10-21, 16:37
+
Michael Segel 2013-10-21, 20:05
Copy link to this message
-
Re: row filter - binary comparator at certain range
James Taylor 2013-10-21, 20:26
What do you think it should be called, because
"prepending-row-key-with-single-hashed-byte" doesn't have a very good ring
to it. :-)

Agree that getting the row key design right is crucial.

The range of "prepending-row-key-with-single-hashed-byte" is declarative
when you create your table in Phoenix, so you typically declare an upper
bound based on your cluster size (not 255, but maybe 8 or 16). We've run
the numbers and it's typically faster, but as with most things, not always.

HTH,
James
On Mon, Oct 21, 2013 at 1:05 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Then its not a SALT. And please don't use the term 'salt' because it has
> specific meaning outside to what you want it to mean.  Just like saying
> HBase has ACID because you write the entire row as an atomic element.  But
> I digress….
>
> Ok so to your point…
>
> 1 byte == 255 possible values.
>
> So which will be faster.
>
> creating a list of the 1 byte truncated hash of each possible timestamp in
> your range, or doing 255 separate range scans with the start and stop range
> key set?
>
> That will give you the results you want, however… I'd go back and have
> them possibly rethink the row key if they can … assuming this is the base
> access pattern.
>
> HTH
>
> -Mike
>
>
>
>
>
> On Oct 21, 2013, at 11:37 AM, James Taylor <[EMAIL PROTECTED]> wrote:
>
> > Phoenix restricts salting to a single byte.
> > Salting perhaps is misnamed, as the salt byte is a stable hash based on
> the
> > row key.
> > Phoenix's skip scan supports sub-key ranges.
> > We've found salting in general to be faster (though there are cases where
> > it's not), as it ensures better parallelization.
> >
> > Regards,
> > James
> >
> >
> >
> > On Mon, Oct 21, 2013 at 9:14 AM, Vladimir Rodionov
> > <[EMAIL PROTECTED]>wrote:
> >
> >> FuzzyRowFilter does not work on sub-key ranges.
> >> Salting is bad for any scan operation, unfortunately. When salt prefix
> >> cardinality is small (1-2 bytes),
> >> one can try something similar to FuzzyRowFilter but with additional
> >> sub-key range support.
> >> If salt prefix cardinality is high (> 2 bytes) - do a full scan with
> your
> >> own Filter (for timestamp ranges).
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: [EMAIL PROTECTED]
> >>
> >> ________________________________________
> >> From: Premal Shah [[EMAIL PROTECTED]]
> >> Sent: Sunday, October 20, 2013 10:42 PM
> >> To: user
> >> Subject: Re: row filter - binary comparator at certain range
> >>
> >> Have you looked at FuzzyRowFilter? Seems to me that it might satisfy
> your
> >> use-case.
> >>
> >>
> http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
> >>
> >>
> >> On Sun, Oct 20, 2013 at 9:31 PM, Tony Duan <[EMAIL PROTECTED]> wrote:
> >>
> >>> Alex Vasilenko <aa.vasilenko@...> writes:
> >>>
> >>>>
> >>>> Lars,
> >>>>
> >>>> But how it will behave, when I have salt at the beginning of the key
> to
> >>>> properly shard table across regions? Imagine row key of format
> >>>> salt:timestamp and rows goes like this:
> >>>> ...
> >>>> 1:15
> >>>> 1:16
> >>>> 1:17
> >>>> 1:23
> >>>> 2:3
> >>>> 2:5
> >>>> 2:12
> >>>> 2:15
> >>>> 2:19
> >>>> 2:25
> >>>> ...
> >>>>
> >>>> And I want to find all rows, that has second part (timestamp) in range
> >>>> 15-25. What startKey and endKey should be used?
> >>>>
> >>>> Alexandr Vasilenko
> >>>> Web Developer
> >>>> Skype:menterr
> >>>> mob: +38097-611-45-99
> >>>>
> >>>> 2012/2/9 lars hofhansl <lhofhansl@...>
> >>> Hi,
> >>> Alexandr Vasilenko
> >>> Have you ever resolved this issue?i am also facing this iusse.
> >>> i also want implement this functionality.
> >>> Imagine row key of format
> >>> salt:timestamp and rows goes like this:
> >>> ...
> >>> 1:15
> >>> 1:16
> >>> 1:17
> >>> 1:23
> >>> 2:3
> >>> 2:5
> >>> 2:12
> >>> 2:15
> >>> 2:19
> >>> 2:25
> >>> ...
> >>>
> >>> And I want to find all rows, that has second part (timestamp) in range
+
Michael Segel 2013-10-21, 20:48
+
James Taylor 2013-10-21, 22:07
+
Michael Segel 2013-10-22, 00:58
+
James Taylor 2013-10-22, 03:54
+
Asaf Mesika 2013-11-01, 07:25
+
Vladimir Rodionov 2013-10-21, 16:50
+
Vladimir Rodionov 2013-10-21, 16:36
+
Tony Duan 2013-10-22, 07:55