Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Schema design for filters


+
Kristoffer Sjögren 2013-06-27, 17:59
+
Michael Segel 2013-06-27, 19:21
Copy link to this message
-
Re: Schema design for filters
Kristoffer Sjögren 2013-06-27, 21:41
I realize standard comparators cannot solve this.

However I do know the type of each column so writing custom list
comparators for boolean, char, byte, short, int, long, float, double seems
quite straightforward.

Long arrays, for example, are stored as a byte array with 8 bytes per item
so a comparator might look like this.

public class LongsComparator extends WritableByteArrayComparable {
    public int compareTo(byte[] value, int offset, int length) {
        long[] values = BytesUtils.toLongs(value, offset, length);
        for (long longValue : values) {
            if (longValue == val) {
                return 0;
            }
        }
        return 1;
    }
}

public static long[] toLongs(byte[] value, int offset, int length) {
    int num = (length - offset) / 8;
    long[] values = new long[num];
    for (int i = offset; i < num; i++) {
        values[i] = getLong(value, i * 8);
    }
    return values;
}
Strings are similar but would require charset and length for each string.

public class StringsComparator extends WritableByteArrayComparable  {
    public int compareTo(byte[] value, int offset, int length) {
        String[] values = BytesUtils.toStrings(value, offset, length);
        for (String stringValue : values) {
            if (val.equals(stringValue)) {
                return 0;
            }
        }
        return 1;
    }
}

public static String[] toStrings(byte[] value, int offset, int length) {
    ArrayList<String> values = new ArrayList<String>();
    int idx = 0;
    ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
    while (idx < length) {
        int size = buffer.getInt();
        byte[] bytes = new byte[size];
        buffer.get(bytes);
        values.add(new String(bytes));
        idx += 4 + size;
    }
    return values.toArray(new String[values.size()]);
}
Am I on the right track or maybe overlooking some implementation details?
Not really sure how to target each comparator to a specific column value?
On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Not an easy task.
>
> You first need to determine how you want to store the data within a column
> and/or apply a type constraint to a column.
>
> Even if you use JSON records to store your data within a column, does an
> equality comparator exist? If not, you would have to write one.
> (I kinda think that one may already exist...)
>
>
> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:
>
> > Hi
> >
> > Working with the standard filtering mechanism to scan rows that have
> > columns matching certain criterias.
> >
> > There are columns of numeric (integer and decimal) and string types.
> These
> > columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" or
> > "a,b,c" - not sure what the separator would be in the case of list types.
> > Maybe none?
> >
> > I would like to compose the following queries to filter out rows that
> does
> > not match.
> >
> > - contains(String column, String value)
> >  Single valued column that String.contain() provided value.
> >
> > - equal(String column, Object value)
> >  Single valued column that Object.equals() provided value.
> >  Value is either string or numeric type.
> >
> > - greaterThan(String column, java.lang.Number value)
> >  Single valued column that > provided numeric value.
> >
> > - in(String column, Object value...)
> >  Multi-valued column have values that Object.equals() all provided
> values.
> >  Values are of string or numeric type.
> >
> > How would I design a schema that can take advantage of the already
> existing
> > filters and comparators to accomplish this?
> >
> > Already looked at the string and binary comparators but fail to see how
> to
> > solve this in a clean way for multi-valued column values.
> >
> > Im aware of custom filters but would like to avoid it if possible.
> >
> > Cheers,
> > -Kristoffer
>
>
+
Michael Segel 2013-06-27, 21:51
+
Kristoffer Sjögren 2013-06-27, 22:13
+
Michael Segel 2013-06-27, 22:58
+
Kristoffer Sjögren 2013-06-27, 23:39
+
James Taylor 2013-06-28, 01:55
+
Kristoffer Sjögren 2013-06-28, 09:24
+
Otis Gospodnetic 2013-06-28, 18:34
+
Kristoffer Sjögren 2013-06-28, 18:53
+
Otis Gospodnetic 2013-06-28, 18:58
+
Asaf Mesika 2013-06-28, 21:30
+
Michel Segel 2013-06-28, 23:45
+
Kristoffer Sjögren 2013-06-29, 11:29
+
Michael Segel 2013-06-28, 12:45