Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Column qualifiers with hierarchy and filters


Copy link to this message
-
Re: Column qualifiers with hierarchy and filters
Both are created when you declare the table and not in runtime so in
shouldn't matter to you anyway

On Thursday, November 7, 2013, Nasron Cheong wrote:

> Why is that? Afaik everything is just a byte sequence, what prevents
> non-printable chars from being used in CF/table names?
>
> - Nasron
>
>
> On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]
> > wrote:
>
> > This is fine for the key. Just so you are aware, you can not use this for
> > table name and CF name since they need to be printable characters only.
> >
> > JM
> >
> >
> > 2013/11/6 Nasron Cheong <[EMAIL PROTECTED]>
> >
> > > Yes, after some digging around, the key is to store integers as byte
> > > representation, but more importantly to store them as big-endian so
> that
> > > the lexicographical sequence is maintained.
> > >
> > > Thanks!
> > >
> > > - Nasron
> > >
> > >
> > > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > you can store the byte representation of the integer (fixed length)
> > > instead
> > > > of the integer (which will be stored as strings of variable length)
> and
> > > > will also be sorted.
> > > >
> > > >
> > > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > > > <[EMAIL PROTECTED]>wrote:
> > > >
> > > > > Yes, its limited in the sense that we have to precalculate the
> number
> > > of
> > > > > digits required so we don't run out, and if we overestimate, then
> our
> > > row
> > > > > keys end up taking up more space than we'd care to.
> > > > >
> > > > > We can probably live with this approach for now, but I wonder if
> > > there's
> > > > a
> > > > > better way.
> > > > >
> > > > > - Nasron
> > > > >
> > > > >
> > > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > > > [EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Hi Nasron,
> > > > > >
> > > > > > Why are you saying that it's a limited way? Does it achieve your
> > > needs?
> > > > > >
> > > > > >
> > > > > > 2013/11/4 Nasron Cheong <[EMAIL PROTECTED]>
> > > > > >
> > > > > > > An example query would be the following, say the column
> qualifier
> > > was
> > > > > of
> > > > > > > the form
> > > > > > >
> > > > > > > <bucket #>:<msg type>
> > > > > > >
> > > > > > > where <bucket #> should be an integer value, and msg type is a
> > > > string.
> > > > > > E.g.
> > > > > > >
> > > > > > > 1:abc
> > > > > > > 1000:abc
> > > > > > > 2: abc
> > > > > > >
> > > > > > > would appear in the above sequence, which is out of order when
> > > doing
> > > > > > prefix
> > > > > > > filtering. Zero padding could fix this:
> > > > > > >
> > > > > > > 0001:abc
> > > > > > > 0002:abc
> > > > > > > 1000: abc
> > > > > > >
> > > > > > > But is a limited way of ensuring the sequence of CQ (column
> > > > qualifiers)
> > > > > > is
> > > > > > > correct, in order for prefix filtering to work. Are there other
> > > > > options?
> > > > > > >
> > > > > > > - Nasron
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > > > <[EMAIL PROTECTED]>wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm trying to determine the best way to serialize a sequence
> of
> > > > > >