Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Schema design for filters


Copy link to this message
-
Re: Schema design for filters
I realize standard comparators cannot solve this.

However I do know the type of each column so writing custom list
comparators for boolean, char, byte, short, int, long, float, double seems
quite straightforward.

Long arrays, for example, are stored as a byte array with 8 bytes per item
so a comparator might look like this.

public class LongsComparator extends WritableByteArrayComparable {
    public int compareTo(byte[] value, int offset, int length) {
        long[] values = BytesUtils.toLongs(value, offset, length);
        for (long longValue : values) {
            if (longValue == val) {
                return 0;
            }
        }
        return 1;
    }
}

public static long[] toLongs(byte[] value, int offset, int length) {
    int num = (length - offset) / 8;
    long[] values = new long[num];
    for (int i = offset; i < num; i++) {
        values[i] = getLong(value, i * 8);
    }
    return values;
}
Strings are similar but would require charset and length for each string.

public class StringsComparator extends WritableByteArrayComparable  {
    public int compareTo(byte[] value, int offset, int length) {
        String[] values = BytesUtils.toStrings(value, offset, length);
        for (String stringValue : values) {
            if (val.equals(stringValue)) {
                return 0;
            }
        }
        return 1;
    }
}

public static String[] toStrings(byte[] value, int offset, int length) {
    ArrayList<String> values = new ArrayList<String>();
    int idx = 0;
    ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
    while (idx < length) {
        int size = buffer.getInt();
        byte[] bytes = new byte[size];
        buffer.get(bytes);
        values.add(new String(bytes));
        idx += 4 + size;
    }
    return values.toArray(new String[values.size()]);
}
Am I on the right track or maybe overlooking some implementation details?
Not really sure how to target each comparator to a specific column value?
On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <[EMAIL PROTECTED]>wrote:

> Not an easy task.
>
> You first need to determine how you want to store the data within a column
> and/or apply a type constraint to a column.
>
> Even if you use JSON records to store your data within a column, does an
> equality comparator exist? If not, you would have to write one.
> (I kinda think that one may already exist...)
>
>
> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:
>
> > Hi
> >
> > Working with the standard filtering mechanism to scan rows that have
> > columns matching certain criterias.
> >
> > There are columns of numeric (integer and decimal) and string types.
> These
> > columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" or
> > "a,b,c" - not sure what the separator would be in the case of list types.
> > Maybe none?
> >
> > I would like to compose the following queries to filter out rows that
> does
> > not match.
> >
> > - contains(String column, String value)
> >  Single valued column that String.contain() provided value.
> >
> > - equal(String column, Object value)
> >  Single valued column that Object.equals() provided value.
> >  Value is either string or numeric type.
> >
> > - greaterThan(String column, java.lang.Number value)
> >  Single valued column that > provided numeric value.
> >
> > - in(String column, Object value...)
> >  Multi-valued column have values that Object.equals() all provided
> values.
> >  Values are of string or numeric type.
> >
> > How would I design a schema that can take advantage of the already
> existing
> > filters and comparators to accomplish this?
> >
> > Already looked at the string and binary comparators but fail to see how
> to
> > solve this in a clean way for multi-valued column values.
> >
> > Im aware of custom filters but would like to avoid it if possible.
> >
> > Cheers,
> > -Kristoffer
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB