Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Schema design for filters


+
Kristoffer Sjögren 2013-06-27, 17:59
+
Michael Segel 2013-06-27, 19:21
+
Kristoffer Sjögren 2013-06-27, 21:41
+
Michael Segel 2013-06-27, 21:51
+
Kristoffer Sjögren 2013-06-27, 22:13
+
Michael Segel 2013-06-27, 22:58
+
Kristoffer Sjögren 2013-06-27, 23:39
+
James Taylor 2013-06-28, 01:55
+
Kristoffer Sjögren 2013-06-28, 09:24
+
Otis Gospodnetic 2013-06-28, 18:34
+
Kristoffer Sjögren 2013-06-28, 18:53
+
Otis Gospodnetic 2013-06-28, 18:58
+
Asaf Mesika 2013-06-28, 21:30
Copy link to this message
-
Re: Schema design for filters
Michel Segel 2013-06-28, 23:45
This doesn't make sense in that the OP wants schema less  structure, yet wants filtering on columns. The issue is that you do have a limited Schema, so Schema less is a misnomer.

In order to do filtering, you need to enforce object type within a column which requires a Schema to be enforced.

Again, this can be done in HBase.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 28, 2013, at 4:30 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Yep. Other DBs like
> Mongo may have the stuff you need out of the box.
> Another option is to encode the whole class using Avro, and writing a
> filter on top of that.
> You basically use one column and store it there.
> Yes, you pay the penalty of loading your entire class and extract the
> fields you need to compare against, but I'm really not sure the other way
> is faster, taking into account the hint mechanism in Filter which is
> pinpointed thus grabs more bytes than it needs to.
>
> Back what was said earlier: 1M rows- why not MySql?
>
> On Friday, June 28, 2013, Otis Gospodnetic wrote:
>
>> Hi,
>>
>> I see.  Btw. isn't HBase for < 1M rows an overkill?
>> Note that Lucene is schemaless and both Solr and Elasticsearch can
>> detect field types, so in a way they are schemaless, too.
>>
>> Otis
>> --
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>> wrote:
>>> @Otis
>>>
>>> HBase is a natural fit for my usecase because its schemaless. Im
>> building a
>>> configuration management system and there is no need for advanced
>>> filtering/querying capabilities, just basic predicate logic and
>> pagination
>>> that scales to < 1 million rows with reasonable performance.
>>>
>>> Thanks for the tip!
>>>
>>>
>>> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Kristoffer,
>>>>
>>>> You could also consider using something other than HBase, something
>>>> that supports "secondary indices", like anything that is Lucene based
>>>> - Solr and ElasticSearch for example.  We recently compared how we
>>>> aggregate data in HBase (see my signature) and how we would do it if
>>>> we were to use Solr (or ElasticSearch), and so far things look better
>>>> in Solr for our use case.  And our use case involves a lot of
>>>> filtering, slicing and dicing..... something to consider...
>>>>
>>>> Otis
>>>> --
>>>> Solr & ElasticSearch Support -- http://sematext.com/
>>>> Performance Monitoring -- http://sematext.com/spm
>>>>
>>>>
>>>>
>>>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>>>> wrote:
>>>>> Interesting. Im actually building something similar.
>>>>>
>>>>> A fullblown SQL implementation is bit overkill for my particular
>> usecase
>>>>> and the query API is the final piece to the puzzle. But ill definitely
>>>> have
>>>>> a look for some inspiration.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[EMAIL PROTECTED]
>>>>> wrote:
>>>>>
>>>>>> Hi Kristoffer,
>>>>>> Have you had a look at Phoenix (
>> https://github.com/forcedotcom/phoenix
>>>> )?
>>>>>> You could model your schema much like an O/R mapper and issue SQL
>>>> queries
>>>>>> through Phoenix for your filtering.
>>>>>>
>>>>>> James
>>>>>> @JamesPlusPlus
>>>>>> http://phoenix-hbase.blogspot.com
>>>>>>
>>>>>> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for your help Mike. Much appreciated.
>>>>>>>
>>>>>>> I dont store rows/columns in JSON format. The schema is exactly
>> that
>>>> of a
>>>>>>> specific java class, where the rowkey is a unique object identifier
>>>> with
>>>>>>> the class type encoded into it. Columns are the field names of the
>>>> class
>>>>>>> and the values are that of the object instance.
>>>>>>>
>>>>>>> Did think about coprocessors but the schema is discovered a runtime
>>>> and I
>>>>>>> cant hard code it.
+
Kristoffer Sjögren 2013-06-29, 11:29
+
Michael Segel 2013-06-28, 12:45