Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Schema design for filters


+
Kristoffer Sjögren 2013-06-27, 17:59
+
Michael Segel 2013-06-27, 19:21
+
Kristoffer Sjögren 2013-06-27, 21:41
+
Michael Segel 2013-06-27, 21:51
+
Kristoffer Sjögren 2013-06-27, 22:13
+
Michael Segel 2013-06-27, 22:58
+
Kristoffer Sjögren 2013-06-27, 23:39
+
James Taylor 2013-06-28, 01:55
+
Kristoffer Sjögren 2013-06-28, 09:24
+
Otis Gospodnetic 2013-06-28, 18:34
Copy link to this message
-
Re: Schema design for filters
@Otis

HBase is a natural fit for my usecase because its schemaless. Im building a
configuration management system and there is no need for advanced
filtering/querying capabilities, just basic predicate logic and pagination
that scales to < 1 million rows with reasonable performance.

Thanks for the tip!
On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> Kristoffer,
>
> You could also consider using something other than HBase, something
> that supports "secondary indices", like anything that is Lucene based
> - Solr and ElasticSearch for example.  We recently compared how we
> aggregate data in HBase (see my signature) and how we would do it if
> we were to use Solr (or ElasticSearch), and so far things look better
> in Solr for our use case.  And our use case involves a lot of
> filtering, slicing and dicing..... something to consider...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>
> wrote:
> > Interesting. Im actually building something similar.
> >
> > A fullblown SQL implementation is bit overkill for my particular usecase
> > and the query API is the final piece to the puzzle. But ill definitely
> have
> > a look for some inspiration.
> >
> > Thanks!
> >
> >
> >
> > On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hi Kristoffer,
> >> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix
> )?
> >> You could model your schema much like an O/R mapper and issue SQL
> queries
> >> through Phoenix for your filtering.
> >>
> >> James
> >> @JamesPlusPlus
> >> http://phoenix-hbase.blogspot.com
> >>
> >> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Thanks for your help Mike. Much appreciated.
> >> >
> >> > I dont store rows/columns in JSON format. The schema is exactly that
> of a
> >> > specific java class, where the rowkey is a unique object identifier
> with
> >> > the class type encoded into it. Columns are the field names of the
> class
> >> > and the values are that of the object instance.
> >> >
> >> > Did think about coprocessors but the schema is discovered a runtime
> and I
> >> > cant hard code it.
> >> >
> >> > However, I still believe that filters might work. Had a look
> >> > at SingleColumnValueFilter and this filter is be able to target
> specific
> >> > column qualifiers with specific WritableByteArrayComparables.
> >> >
> >> > But list comparators are still missing... So I guess the only way is
> to
> >> > write these comparators?
> >> >
> >> > Do you follow my reasoning? Will it work?
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
> >> > <[EMAIL PROTECTED]>wrote:
> >> >
> >> >> Ok...
> >> >>
> >> >> If you want to do type checking and schema enforcement...
> >> >>
> >> >> You will need to do this as a coprocessor.
> >> >>
> >> >> The quick and dirty way... (Not recommended) would be to hard code
> the
> >> >> schema in to the co-processor code.)
> >> >>
> >> >> A better way... at start up, load up ZK to manage the set of known
> table
> >> >> schemas which would be a map of column qualifier to data type.
> >> >> (If JSON then you need to do a separate lookup to get the records
> >> schema)
> >> >>
> >> >> Then a single java class that does the look up and then handles the
> >> known
> >> >> data type comparators.
> >> >>
> >> >> Does this make sense?
> >> >> (Sorry, kinda was thinking this out as I typed the response. But it
> >> should
> >> >> work )
> >> >>
> >> >> At least it would be a design approach I would talk. YMMV
> >> >>
> >> >> Having said that, I expect someone to say its a bad idea and that
> they
> >> >> have a better solution.
> >> >>
> >> >> HTH
> >> >>
> >> >> -Mike
> >> >>
> >> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
> >> wrote:
+
Otis Gospodnetic 2013-06-28, 18:58
+
Asaf Mesika 2013-06-28, 21:30
+
Michel Segel 2013-06-28, 23:45
+
Kristoffer Sjögren 2013-06-29, 11:29
+
Michael Segel 2013-06-28, 12:45
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB