Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Schema design for filters

Copy link to this message
Re: Schema design for filters
This doesn't make sense in that the OP wants schema less  structure, yet wants filtering on columns. The issue is that you do have a limited Schema, so Schema less is a misnomer.

In order to do filtering, you need to enforce object type within a column which requires a Schema to be enforced.

Again, this can be done in HBase.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 28, 2013, at 4:30 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Yep. Other DBs like
> Mongo may have the stuff you need out of the box.
> Another option is to encode the whole class using Avro, and writing a
> filter on top of that.
> You basically use one column and store it there.
> Yes, you pay the penalty of loading your entire class and extract the
> fields you need to compare against, but I'm really not sure the other way
> is faster, taking into account the hint mechanism in Filter which is
> pinpointed thus grabs more bytes than it needs to.
> Back what was said earlier: 1M rows- why not MySql?
> On Friday, June 28, 2013, Otis Gospodnetic wrote:
>> Hi,
>> I see.  Btw. isn't HBase for < 1M rows an overkill?
>> Note that Lucene is schemaless and both Solr and Elasticsearch can
>> detect field types, so in a way they are schemaless, too.
>> Otis
>> --
>> Performance Monitoring -- http://sematext.com/spm
>> On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>> wrote:
>>> @Otis
>>> HBase is a natural fit for my usecase because its schemaless. Im
>> building a
>>> configuration management system and there is no need for advanced
>>> filtering/querying capabilities, just basic predicate logic and
>> pagination
>>> that scales to < 1 million rows with reasonable performance.
>>> Thanks for the tip!
>>> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
>>> [EMAIL PROTECTED]> wrote:
>>>> Kristoffer,
>>>> You could also consider using something other than HBase, something
>>>> that supports "secondary indices", like anything that is Lucene based
>>>> - Solr and ElasticSearch for example.  We recently compared how we
>>>> aggregate data in HBase (see my signature) and how we would do it if
>>>> we were to use Solr (or ElasticSearch), and so far things look better
>>>> in Solr for our use case.  And our use case involves a lot of
>>>> filtering, slicing and dicing..... something to consider...
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>> Performance Monitoring -- http://sematext.com/spm
>>>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> Interesting. Im actually building something similar.
>>>>> A fullblown SQL implementation is bit overkill for my particular
>> usecase
>>>>> and the query API is the final piece to the puzzle. But ill definitely
>>>> have
>>>>> a look for some inspiration.
>>>>> Thanks!
>>>>> On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[EMAIL PROTECTED]
>>>>> wrote:
>>>>>> Hi Kristoffer,
>>>>>> Have you had a look at Phoenix (
>> https://github.com/forcedotcom/phoenix
>>>> )?
>>>>>> You could model your schema much like an O/R mapper and issue SQL
>>>> queries
>>>>>> through Phoenix for your filtering.
>>>>>> James
>>>>>> @JamesPlusPlus
>>>>>> http://phoenix-hbase.blogspot.com
>>>>>> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>> Thanks for your help Mike. Much appreciated.
>>>>>>> I dont store rows/columns in JSON format. The schema is exactly
>> that
>>>> of a
>>>>>>> specific java class, where the rowkey is a unique object identifier
>>>> with
>>>>>>> the class type encoded into it. Columns are the field names of the
>>>> class
>>>>>>> and the values are that of the object instance.
>>>>>>> Did think about coprocessors but the schema is discovered a runtime
>>>> and I
>>>>>>> cant hard code it.