Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Schema design for filters


Copy link to this message
-
Re: Schema design for filters
Kristoffer Sjögren 2013-06-28, 18:53
@Otis

HBase is a natural fit for my usecase because its schemaless. Im building a
configuration management system and there is no need for advanced
filtering/querying capabilities, just basic predicate logic and pagination
that scales to < 1 million rows with reasonable performance.

Thanks for the tip!
On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Kristoffer,
>
> You could also consider using something other than HBase, something
> that supports "secondary indices", like anything that is Lucene based
> - Solr and ElasticSearch for example.  We recently compared how we
> aggregate data in HBase (see my signature) and how we would do it if
> we were to use Solr (or ElasticSearch), and so far things look better
> in Solr for our use case.  And our use case involves a lot of
> filtering, slicing and dicing..... something to consider...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>
> wrote:
> > Interesting. Im actually building something similar.
> >
> > A fullblown SQL implementation is bit overkill for my particular usecase
> > and the query API is the final piece to the puzzle. But ill definitely
> have
> > a look for some inspiration.
> >
> > Thanks!
> >
> >
> >
> > On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hi Kristoffer,
> >> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix
> )?
> >> You could model your schema much like an O/R mapper and issue SQL
> queries
> >> through Phoenix for your filtering.
> >>
> >> James
> >> @JamesPlusPlus
> >> http://phoenix-hbase.blogspot.com
> >>
> >> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Thanks for your help Mike. Much appreciated.
> >> >
> >> > I dont store rows/columns in JSON format. The schema is exactly that
> of a
> >> > specific java class, where the rowkey is a unique object identifier
> with
> >> > the class type encoded into it. Columns are the field names of the
> class
> >> > and the values are that of the object instance.
> >> >
> >> > Did think about coprocessors but the schema is discovered a runtime
> and I
> >> > cant hard code it.
> >> >
> >> > However, I still believe that filters might work. Had a look
> >> > at SingleColumnValueFilter and this filter is be able to target
> specific
> >> > column qualifiers with specific WritableByteArrayComparables.
> >> >
> >> > But list comparators are still missing... So I guess the only way is
> to
> >> > write these comparators?
> >> >
> >> > Do you follow my reasoning? Will it work?
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
> >> > <[EMAIL PROTECTED]>wrote:
> >> >
> >> >> Ok...
> >> >>
> >> >> If you want to do type checking and schema enforcement...
> >> >>
> >> >> You will need to do this as a coprocessor.
> >> >>
> >> >> The quick and dirty way... (Not recommended) would be to hard code
> the
> >> >> schema in to the co-processor code.)
> >> >>
> >> >> A better way... at start up, load up ZK to manage the set of known
> table
> >> >> schemas which would be a map of column qualifier to data type.
> >> >> (If JSON then you need to do a separate lookup to get the records
> >> schema)
> >> >>
> >> >> Then a single java class that does the look up and then handles the
> >> known
> >> >> data type comparators.
> >> >>
> >> >> Does this make sense?
> >> >> (Sorry, kinda was thinking this out as I typed the response. But it
> >> should
> >> >> work )
> >> >>
> >> >> At least it would be a design approach I would talk. YMMV
> >> >>
> >> >> Having said that, I expect someone to say its a bad idea and that
> they
> >> >> have a better solution.
> >> >>
> >> >> HTH
> >> >>
> >> >> -Mike
> >> >>
> >> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
> >> wrote: