Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Schema design for filters

Copy link to this message
Re: Schema design for filters
James Taylor 2013-06-28, 01:55
Hi Kristoffer,
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You could model your schema much like an O/R mapper and issue SQL queries through Phoenix for your filtering.


On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]> wrote:

> Thanks for your help Mike. Much appreciated.
> I dont store rows/columns in JSON format. The schema is exactly that of a
> specific java class, where the rowkey is a unique object identifier with
> the class type encoded into it. Columns are the field names of the class
> and the values are that of the object instance.
> Did think about coprocessors but the schema is discovered a runtime and I
> cant hard code it.
> However, I still believe that filters might work. Had a look
> at SingleColumnValueFilter and this filter is be able to target specific
> column qualifiers with specific WritableByteArrayComparables.
> But list comparators are still missing... So I guess the only way is to
> write these comparators?
> Do you follow my reasoning? Will it work?
> On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
>> Ok...
>> If you want to do type checking and schema enforcement...
>> You will need to do this as a coprocessor.
>> The quick and dirty way... (Not recommended) would be to hard code the
>> schema in to the co-processor code.)
>> A better way... at start up, load up ZK to manage the set of known table
>> schemas which would be a map of column qualifier to data type.
>> (If JSON then you need to do a separate lookup to get the records schema)
>> Then a single java class that does the look up and then handles the known
>> data type comparators.
>> Does this make sense?
>> (Sorry, kinda was thinking this out as I typed the response. But it should
>> work )
>> At least it would be a design approach I would talk. YMMV
>> Having said that, I expect someone to say its a bad idea and that they
>> have a better solution.
>> HTH
>> -Mike
>> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:
>>> I see your point. Everything is just bytes.
>>> However, the schema is known and every row is formatted according to this
>>> schema, although some columns may not exist, that is, no value exist for
>>> this property on this row.
>>> So if im able to apply these "typed comparators" to the right cell values
>>> it may be possible? But I cant find a filter that target specific
>> columns?
>>> Seems like all filters scan every column/qualifier and there is no way of
>>> knowing what column is currently being evaluated?
>>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
>>> <[EMAIL PROTECTED]>wrote:
>>>> You have to remember that HBase doesn't enforce any sort of typing.
>>>> That's why this can be difficult.
>>>> You'd have to write a coprocessor to enforce a schema on a table.
>>>> Even then YMMV if you're writing JSON structures to a column because
>> while
>>>> the contents of the structures could be the same, the actual strings
>> could
>>>> differ.
>>>> HTH
>>>> -Mike
>>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>> wrote:
>>>>> I realize standard comparators cannot solve this.
>>>>> However I do know the type of each column so writing custom list
>>>>> comparators for boolean, char, byte, short, int, long, float, double
>>>> seems
>>>>> quite straightforward.
>>>>> Long arrays, for example, are stored as a byte array with 8 bytes per
>>>> item
>>>>> so a comparator might look like this.
>>>>> public class LongsComparator extends WritableByteArrayComparable {
>>>>>  public int compareTo(byte[] value, int offset, int length) {
>>>>>      long[] values = BytesUtils.toLongs(value, offset, length);
>>>>>      for (long longValue : values) {
>>>>>          if (longValue == val) {