Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Schema design for filters


Copy link to this message
-
Re: Schema design for filters
This doesn't make sense in that the OP wants schema less  structure, yet wants filtering on columns. The issue is that you do have a limited Schema, so Schema less is a misnomer.

In order to do filtering, you need to enforce object type within a column which requires a Schema to be enforced.

Again, this can be done in HBase.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 28, 2013, at 4:30 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Yep. Other DBs like
> Mongo may have the stuff you need out of the box.
> Another option is to encode the whole class using Avro, and writing a
> filter on top of that.
> You basically use one column and store it there.
> Yes, you pay the penalty of loading your entire class and extract the
> fields you need to compare against, but I'm really not sure the other way
> is faster, taking into account the hint mechanism in Filter which is
> pinpointed thus grabs more bytes than it needs to.
>
> Back what was said earlier: 1M rows- why not MySql?
>
> On Friday, June 28, 2013, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> I see.  Btw. isn't HBase for < 1M rows an overkill?
>> Note that Lucene is schemaless and both Solr and Elasticsearch can
>> detect field types, so in a way they are schemaless, too.
>>
>> Otis
>> --
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>> wrote:
>>> @Otis
>>>
>>> HBase is a natural fit for my usecase because its schemaless. Im
>> building a
>>> configuration management system and there is no need for advanced
>>> filtering/querying capabilities, just basic predicate logic and
>> pagination
>>> that scales to < 1 million rows with reasonable performance.
>>>
>>> Thanks for the tip!
>>>
>>>
>>> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Kristoffer,
>>>>
>>>> You could also consider using something other than HBase, something
>>>> that supports "secondary indices", like anything that is Lucene based
>>>> - Solr and ElasticSearch for example.  We recently compared how we
>>>> aggregate data in HBase (see my signature) and how we would do it if
>>>> we were to use Solr (or ElasticSearch), and so far things look better
>>>> in Solr for our use case.  And our use case involves a lot of
>>>> filtering, slicing and dicing..... something to consider...
>>>>
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>> Performance Monitoring -- http://sematext.com/spm
>>>>
>>>>
>>>>
>>>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> Interesting. Im actually building something similar.
>>>>>
>>>>> A fullblown SQL implementation is bit overkill for my particular
>> usecase
>>>>> and the query API is the final piece to the puzzle. But ill definitely
>>>> have
>>>>> a look for some inspiration.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[EMAIL PROTECTED]
>>>>> wrote:
>>>>>
>>>>>> Hi Kristoffer,
>>>>>> Have you had a look at Phoenix (
>> https://github.com/forcedotcom/phoenix
>>>> )?
>>>>>> You could model your schema much like an O/R mapper and issue SQL
>>>> queries
>>>>>> through Phoenix for your filtering.
>>>>>>
>>>>>> James
>>>>>> @JamesPlusPlus
>>>>>> http://phoenix-hbase.blogspot.com
>>>>>>
>>>>>> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for your help Mike. Much appreciated.
>>>>>>>
>>>>>>> I dont store rows/columns in JSON format. The schema is exactly
>> that
>>>> of a
>>>>>>> specific java class, where the rowkey is a unique object identifier
>>>> with
>>>>>>> the class type encoded into it. Columns are the field names of the
>>>> class
>>>>>>> and the values are that of the object instance.
>>>>>>>
>>>>>>> Did think about coprocessors but the schema is discovered a runtime
>>>> and I
>>>>>>> cant hard code it.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB