Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Schema design for filters


+
Kristoffer Sjögren 2013-06-27, 17:59
+
Michael Segel 2013-06-27, 19:21
+
Kristoffer Sjögren 2013-06-27, 21:41
+
Michael Segel 2013-06-27, 21:51
+
Kristoffer Sjögren 2013-06-27, 22:13
+
Michael Segel 2013-06-27, 22:58
+
Kristoffer Sjögren 2013-06-27, 23:39
+
James Taylor 2013-06-28, 01:55
+
Kristoffer Sjögren 2013-06-28, 09:24
+
Otis Gospodnetic 2013-06-28, 18:34
+
Kristoffer Sjögren 2013-06-28, 18:53
+
Otis Gospodnetic 2013-06-28, 18:58
+
Asaf Mesika 2013-06-28, 21:30
+
Michel Segel 2013-06-28, 23:45
Copy link to this message
-
Re: Schema design for filters
In terms of scalability, yes, but we use HBase for other stuff aswell,
timeseries, counters and few future ideas around analytics. So its nice if
we can put everything in same deployment.

We dont want users to care about the physical storage (keep them productive
in Java land). The point here of being schemaless is to relieve users of
defining and administering the schema, types, sizes, indexes, queries etc
for every class. Write the class and you're done, no extra implementation
overhead, with a very simplistic query API that work on actual Java types,
nothing else.

Btw, I have already writting a schema less implementation in SQL and its
kinda painful to implement efficient WHERE queries for less than, greater
than if you dont know the target type. HBase's extendability and freedom is
actually quite amazing on this point.

I have done some prototyping on filters now (after looking at Phoenix) and
I think the implementation is quite straightforward. But I havent decided
to split fields to qualifiers or store the instance as a blob. Think im
leaning towards a custom binary format that is able to seek fields through
the blob efficiently.
On Sat, Jun 29, 2013 at 1:45 AM, Michel Segel <[EMAIL PROTECTED]>wrote:

> This doesn't make sense in that the OP wants schema less  structure, yet
> wants filtering on columns. The issue is that you do have a limited Schema,
> so Schema less is a misnomer.
>
> In order to do filtering, you need to enforce object type within a column
> which requires a Schema to be enforced.
>
> Again, this can be done in HBase.
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 28, 2013, at 4:30 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
>
> > Yep. Other DBs like
> > Mongo may have the stuff you need out of the box.
> > Another option is to encode the whole class using Avro, and writing a
> > filter on top of that.
> > You basically use one column and store it there.
> > Yes, you pay the penalty of loading your entire class and extract the
> > fields you need to compare against, but I'm really not sure the other way
> > is faster, taking into account the hint mechanism in Filter which is
> > pinpointed thus grabs more bytes than it needs to.
> >
> > Back what was said earlier: 1M rows- why not MySql?
> >
> > On Friday, June 28, 2013, Otis Gospodnetic wrote:
> >
> >> Hi,
> >>
> >> I see.  Btw. isn't HBase for < 1M rows an overkill?
> >> Note that Lucene is schemaless and both Solr and Elasticsearch can
> >> detect field types, so in a way they are schemaless, too.
> >>
> >> Otis
> >> --
> >> Performance Monitoring -- http://sematext.com/spm
> >>
> >>
> >>
> >> On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
> >> wrote:
> >>> @Otis
> >>>
> >>> HBase is a natural fit for my usecase because its schemaless. Im
> >> building a
> >>> configuration management system and there is no need for advanced
> >>> filtering/querying capabilities, just basic predicate logic and
> >> pagination
> >>> that scales to < 1 million rows with reasonable performance.
> >>>
> >>> Thanks for the tip!
> >>>
> >>>
> >>> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
> >>> [EMAIL PROTECTED]> wrote:
> >>>
> >>>> Kristoffer,
> >>>>
> >>>> You could also consider using something other than HBase, something
> >>>> that supports "secondary indices", like anything that is Lucene based
> >>>> - Solr and ElasticSearch for example.  We recently compared how we
> >>>> aggregate data in HBase (see my signature) and how we would do it if
> >>>> we were to use Solr (or ElasticSearch), and so far things look better
> >>>> in Solr for our use case.  And our use case involves a lot of
> >>>> filtering, slicing and dicing..... something to consider...
> >>>>
> >>>> Otis
> >>>> --
> >>>> Solr & ElasticSearch Support -- http://sematext.com/
> >>>> Performance Monitoring -- http://sematext.com/spm
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]
+
Michael Segel 2013-06-28, 12:45