-Re: Questions on FuzzyRowFilter
Adrien Mogenet 2014-05-03, 08:11
Using 4 random bytes you'll get 2^32 possibilities; thus your data can be
split enough among all the possible regions, but you won't be able to
easily benefit from distributed scans to gather what you want.
Let say you want to split (time+login) with a salted key and you expect to
be able to retrieve events from 20140429 pretty fast. Then I would split
input data among 10 "spans", spread over 10 regions and 10 RS (ie: `$random
% 10'). To retrieve ordered data, I would parallelize Scans over the 10
span groups (<00>-20140429, <01>-20140429...) and merge-sort everything
until I've got all the expected results.
So in term of performances this looks "a little bit" faster than your 2^32
On Fri, May 2, 2014 at 10:09 PM, Software Dev <[EMAIL PROTECTED]>wrote: