Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Schema design for filters


+
Kristoffer Sjögren 2013-06-27, 17:59
+
Michael Segel 2013-06-27, 19:21
+
Kristoffer Sjögren 2013-06-27, 21:41
+
Michael Segel 2013-06-27, 21:51
+
Kristoffer Sjögren 2013-06-27, 22:13
+
Michael Segel 2013-06-27, 22:58
Copy link to this message
-
Re: Schema design for filters
Thanks for your help Mike. Much appreciated.

I dont store rows/columns in JSON format. The schema is exactly that of a
specific java class, where the rowkey is a unique object identifier with
the class type encoded into it. Columns are the field names of the class
and the values are that of the object instance.

Did think about coprocessors but the schema is discovered a runtime and I
cant hard code it.

However, I still believe that filters might work. Had a look
at SingleColumnValueFilter and this filter is be able to target specific
column qualifiers with specific WritableByteArrayComparables.

But list comparators are still missing... So I guess the only way is to
write these comparators?

Do you follow my reasoning? Will it work?
On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
<[EMAIL PROTECTED]>wrote:

> Ok...
>
> If you want to do type checking and schema enforcement...
>
> You will need to do this as a coprocessor.
>
> The quick and dirty way... (Not recommended) would be to hard code the
> schema in to the co-processor code.)
>
> A better way... at start up, load up ZK to manage the set of known table
> schemas which would be a map of column qualifier to data type.
> (If JSON then you need to do a separate lookup to get the records schema)
>
> Then a single java class that does the look up and then handles the known
> data type comparators.
>
> Does this make sense?
> (Sorry, kinda was thinking this out as I typed the response. But it should
> work )
>
> At least it would be a design approach I would talk. YMMV
>
> Having said that, I expect someone to say its a bad idea and that they
> have a better solution.
>
> HTH
>
> -Mike
>
> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:
>
> > I see your point. Everything is just bytes.
> >
> > However, the schema is known and every row is formatted according to this
> > schema, although some columns may not exist, that is, no value exist for
> > this property on this row.
> >
> > So if im able to apply these "typed comparators" to the right cell values
> > it may be possible? But I cant find a filter that target specific
> columns?
> >
> > Seems like all filters scan every column/qualifier and there is no way of
> > knowing what column is currently being evaluated?
> >
> >
> > On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
> > <[EMAIL PROTECTED]>wrote:
> >
> >> You have to remember that HBase doesn't enforce any sort of typing.
> >> That's why this can be difficult.
> >>
> >> You'd have to write a coprocessor to enforce a schema on a table.
> >> Even then YMMV if you're writing JSON structures to a column because
> while
> >> the contents of the structures could be the same, the actual strings
> could
> >> differ.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
> wrote:
> >>
> >>> I realize standard comparators cannot solve this.
> >>>
> >>> However I do know the type of each column so writing custom list
> >>> comparators for boolean, char, byte, short, int, long, float, double
> >> seems
> >>> quite straightforward.
> >>>
> >>> Long arrays, for example, are stored as a byte array with 8 bytes per
> >> item
> >>> so a comparator might look like this.
> >>>
> >>> public class LongsComparator extends WritableByteArrayComparable {
> >>>   public int compareTo(byte[] value, int offset, int length) {
> >>>       long[] values = BytesUtils.toLongs(value, offset, length);
> >>>       for (long longValue : values) {
> >>>           if (longValue == val) {
> >>>               return 0;
> >>>           }
> >>>       }
> >>>       return 1;
> >>>   }
> >>> }
> >>>
> >>> public static long[] toLongs(byte[] value, int offset, int length) {
> >>>   int num = (length - offset) / 8;
> >>>   long[] values = new long[num];
> >>>   for (int i = offset; i < num; i++) {
> >>>       values[i] = getLong(value, i * 8);
> >>>   }
> >>>   return values;
> >>> }
> >>>
> >>>
> >>> Strings are similar but would require charset and length for each
+
James Taylor 2013-06-28, 01:55
+
Kristoffer Sjögren 2013-06-28, 09:24
+
Otis Gospodnetic 2013-06-28, 18:34
+
Kristoffer Sjögren 2013-06-28, 18:53
+
Otis Gospodnetic 2013-06-28, 18:58
+
Asaf Mesika 2013-06-28, 21:30
+
Michel Segel 2013-06-28, 23:45
+
Kristoffer Sjögren 2013-06-29, 11:29
+
Michael Segel 2013-06-28, 12:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB