Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Column qualifiers with hierarchy and filters


Copy link to this message
-
Re: Column qualifiers with hierarchy and filters
Nasron Cheong 2013-11-06, 14:48
Yes, after some digging around, the key is to store integers as byte
representation, but more importantly to store them as big-endian so that
the lexicographical sequence is maintained.

Thanks!

- Nasron
On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <[EMAIL PROTECTED]> wrote:

> you can store the byte representation of the integer (fixed length) instead
> of the integer (which will be stored as strings of variable length) and
> will also be sorted.
>
>
> On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> <[EMAIL PROTECTED]>wrote:
>
> > Yes, its limited in the sense that we have to precalculate the number of
> > digits required so we don't run out, and if we overestimate, then our row
> > keys end up taking up more space than we'd care to.
> >
> > We can probably live with this approach for now, but I wonder if there's
> a
> > better way.
> >
> > - Nasron
> >
> >
> > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi Nasron,
> > >
> > > Why are you saying that it's a limited way? Does it achieve your needs?
> > >
> > >
> > > 2013/11/4 Nasron Cheong <[EMAIL PROTECTED]>
> > >
> > > > An example query would be the following, say the column qualifier was
> > of
> > > > the form
> > > >
> > > > <bucket #>:<msg type>
> > > >
> > > > where <bucket #> should be an integer value, and msg type is a
> string.
> > > E.g.
> > > >
> > > > 1:abc
> > > > 1000:abc
> > > > 2: abc
> > > >
> > > > would appear in the above sequence, which is out of order when doing
> > > prefix
> > > > filtering. Zero padding could fix this:
> > > >
> > > > 0001:abc
> > > > 0002:abc
> > > > 1000: abc
> > > >
> > > > But is a limited way of ensuring the sequence of CQ (column
> qualifiers)
> > > is
> > > > correct, in order for prefix filtering to work. Are there other
> > options?
> > > >
> > > > - Nasron
> > > >
> > > >
> > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to determine the best way to serialize a sequence of
> > > > > integers/strings that represent a hierarchy for a column qualifier,
> > > which
> > > > > would be compatible with the ColumnPrefixFilters, and
> > > BinaryComparators.
> > > > >
> > > > > However, due to the lexicographical sorting, it's awkward to
> > serialize
> > > > the
> > > > > sequence of values needed to get it to work.
> > > > >
> > > > > What are the typical solutions to this? Do people just zero pad
> > > integers
> > > > > to make sure they sort correctly? Or do I have to implement my own
> > > > > QualifierFilter - which seems expensive since I'd be deserializing
> > > every
> > > > > byte array just to compare.
> > > > >
> > > > > Thanks
> > > > >
> > > > > - Nasron
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Regards,
> Premal Shah.
>