Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Schema design for filters


Copy link to this message
-
Re: Schema design for filters
Michael Segel 2013-06-27, 19:21
Not an easy task.

You first need to determine how you want to store the data within a column and/or apply a type constraint to a column.

Even if you use JSON records to store your data within a column, does an equality comparator exist? If not, you would have to write one.
(I kinda think that one may already exist...)
On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:

> Hi
>
> Working with the standard filtering mechanism to scan rows that have
> columns matching certain criterias.
>
> There are columns of numeric (integer and decimal) and string types. These
> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" or
> "a,b,c" - not sure what the separator would be in the case of list types.
> Maybe none?
>
> I would like to compose the following queries to filter out rows that does
> not match.
>
> - contains(String column, String value)
>  Single valued column that String.contain() provided value.
>
> - equal(String column, Object value)
>  Single valued column that Object.equals() provided value.
>  Value is either string or numeric type.
>
> - greaterThan(String column, java.lang.Number value)
>  Single valued column that > provided numeric value.
>
> - in(String column, Object value...)
>  Multi-valued column have values that Object.equals() all provided values.
>  Values are of string or numeric type.
>
> How would I design a schema that can take advantage of the already existing
> filters and comparators to accomplish this?
>
> Already looked at the string and binary comparators but fail to see how to
> solve this in a clean way for multi-valued column values.
>
> Im aware of custom filters but would like to avoid it if possible.
>
> Cheers,
> -Kristoffer