Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Schema design for filters


Copy link to this message
-
Re: Schema design for filters
Hi,

I see.  Btw. isn't HBase for < 1M rows an overkill?
Note that Lucene is schemaless and both Solr and Elasticsearch can
detect field types, so in a way they are schemaless, too.

Otis
--
Performance Monitoring -- http://sematext.com/spm

On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <[EMAIL PROTECTED]> wrote:
> @Otis
>
> HBase is a natural fit for my usecase because its schemaless. Im building a
> configuration management system and there is no need for advanced
> filtering/querying capabilities, just basic predicate logic and pagination
> that scales to < 1 million rows with reasonable performance.
>
> Thanks for the tip!
>
>
> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
> [EMAIL PROTECTED]> wrote:
>
>> Kristoffer,
>>
>> You could also consider using something other than HBase, something
>> that supports "secondary indices", like anything that is Lucene based
>> - Solr and ElasticSearch for example.  We recently compared how we
>> aggregate data in HBase (see my signature) and how we would do it if
>> we were to use Solr (or ElasticSearch), and so far things look better
>> in Solr for our use case.  And our use case involves a lot of
>> filtering, slicing and dicing..... something to consider...
>>
>> Otis
>> --
>> Solr & ElasticSearch Support -- http://sematext.com/
>> Performance Monitoring -- http://sematext.com/spm
>>
>>
>>
>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>
>> wrote:
>> > Interesting. Im actually building something similar.
>> >
>> > A fullblown SQL implementation is bit overkill for my particular usecase
>> > and the query API is the final piece to the puzzle. But ill definitely
>> have
>> > a look for some inspiration.
>> >
>> > Thanks!
>> >
>> >
>> >
>> > On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> Hi Kristoffer,
>> >> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix
>> )?
>> >> You could model your schema much like an O/R mapper and issue SQL
>> queries
>> >> through Phoenix for your filtering.
>> >>
>> >> James
>> >> @JamesPlusPlus
>> >> http://phoenix-hbase.blogspot.com
>> >>
>> >> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[EMAIL PROTECTED]>
>> >> wrote:
>> >>
>> >> > Thanks for your help Mike. Much appreciated.
>> >> >
>> >> > I dont store rows/columns in JSON format. The schema is exactly that
>> of a
>> >> > specific java class, where the rowkey is a unique object identifier
>> with
>> >> > the class type encoded into it. Columns are the field names of the
>> class
>> >> > and the values are that of the object instance.
>> >> >
>> >> > Did think about coprocessors but the schema is discovered a runtime
>> and I
>> >> > cant hard code it.
>> >> >
>> >> > However, I still believe that filters might work. Had a look
>> >> > at SingleColumnValueFilter and this filter is be able to target
>> specific
>> >> > column qualifiers with specific WritableByteArrayComparables.
>> >> >
>> >> > But list comparators are still missing... So I guess the only way is
>> to
>> >> > write these comparators?
>> >> >
>> >> > Do you follow my reasoning? Will it work?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
>> >> > <[EMAIL PROTECTED]>wrote:
>> >> >
>> >> >> Ok...
>> >> >>
>> >> >> If you want to do type checking and schema enforcement...
>> >> >>
>> >> >> You will need to do this as a coprocessor.
>> >> >>
>> >> >> The quick and dirty way... (Not recommended) would be to hard code
>> the
>> >> >> schema in to the co-processor code.)
>> >> >>
>> >> >> A better way... at start up, load up ZK to manage the set of known
>> table
>> >> >> schemas which would be a map of column qualifier to data type.
>> >> >> (If JSON then you need to do a separate lookup to get the records
>> >> schema)
>> >> >>
>> >> >> Then a single java class that does the look up and then handles the
>> >> known
>> >> >> data type comparators.
>> >> >>
>> >> >> Does this make sense?