Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Schema design for filters


Copy link to this message
-
Re: Schema design for filters
You have to remember that HBase doesn't enforce any sort of typing.
That's why this can be difficult.

You'd have to write a coprocessor to enforce a schema on a table.
Even then YMMV if you're writing JSON structures to a column because while the contents of the structures could be the same, the actual strings could differ.  

HTH

-Mike

On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:

> I realize standard comparators cannot solve this.
>
> However I do know the type of each column so writing custom list
> comparators for boolean, char, byte, short, int, long, float, double seems
> quite straightforward.
>
> Long arrays, for example, are stored as a byte array with 8 bytes per item
> so a comparator might look like this.
>
> public class LongsComparator extends WritableByteArrayComparable {
>    public int compareTo(byte[] value, int offset, int length) {
>        long[] values = BytesUtils.toLongs(value, offset, length);
>        for (long longValue : values) {
>            if (longValue == val) {
>                return 0;
>            }
>        }
>        return 1;
>    }
> }
>
> public static long[] toLongs(byte[] value, int offset, int length) {
>    int num = (length - offset) / 8;
>    long[] values = new long[num];
>    for (int i = offset; i < num; i++) {
>        values[i] = getLong(value, i * 8);
>    }
>    return values;
> }
>
>
> Strings are similar but would require charset and length for each string.
>
> public class StringsComparator extends WritableByteArrayComparable  {
>    public int compareTo(byte[] value, int offset, int length) {
>        String[] values = BytesUtils.toStrings(value, offset, length);
>        for (String stringValue : values) {
>            if (val.equals(stringValue)) {
>                return 0;
>            }
>        }
>        return 1;
>    }
> }
>
> public static String[] toStrings(byte[] value, int offset, int length) {
>    ArrayList<String> values = new ArrayList<String>();
>    int idx = 0;
>    ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
>    while (idx < length) {
>        int size = buffer.getInt();
>        byte[] bytes = new byte[size];
>        buffer.get(bytes);
>        values.add(new String(bytes));
>        idx += 4 + size;
>    }
>    return values.toArray(new String[values.size()]);
> }
>
>
> Am I on the right track or maybe overlooking some implementation details?
> Not really sure how to target each comparator to a specific column value?
>
>
> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <[EMAIL PROTECTED]>wrote:
>
>> Not an easy task.
>>
>> You first need to determine how you want to store the data within a column
>> and/or apply a type constraint to a column.
>>
>> Even if you use JSON records to store your data within a column, does an
>> equality comparator exist? If not, you would have to write one.
>> (I kinda think that one may already exist...)
>>
>>
>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:
>>
>>> Hi
>>>
>>> Working with the standard filtering mechanism to scan rows that have
>>> columns matching certain criterias.
>>>
>>> There are columns of numeric (integer and decimal) and string types.
>> These
>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" or
>>> "a,b,c" - not sure what the separator would be in the case of list types.
>>> Maybe none?
>>>
>>> I would like to compose the following queries to filter out rows that
>> does
>>> not match.
>>>
>>> - contains(String column, String value)
>>> Single valued column that String.contain() provided value.
>>>
>>> - equal(String column, Object value)
>>> Single valued column that Object.equals() provided value.
>>> Value is either string or numeric type.
>>>
>>> - greaterThan(String column, java.lang.Number value)
>>> Single valued column that > provided numeric value.
>>>
>>> - in(String column, Object value...)
>>> Multi-valued column have values that Object.equals() all provided