Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Hbase filter-SubstringComparator vs full text search indexing


+
Shengjie Min 2012-09-10, 14:24
+
Otis Gospodnetic 2012-09-10, 17:41
Copy link to this message
-
Re: Hbase filter-SubstringComparator vs full text search indexing
Two cents below...

On Mon, Sep 10, 2012 at 7:24 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:

> In my case, I have all the log events stored in HDFS/hbase in this format:
>
> timestamp | priority | category | message body
>
> Given I have only 4 fields here, that limits my queries to only against
> these four. I am thinking about more advanced search like full text search
> the message body. well, mainly substring query against message body.
>
>    1.
>
>    Has anybody tried to use Hbase SubstringComparator? How does it perform,
>    with reasonable huge amount of data, can it still provide us the real
> time
>    response capability?
>

Probably not if "huge" is sufficiently large.  Since HBase only stores data
indexed by the primary row key, any other criteria search requires a full
scan of all data.
>    2.
>
>    In my case, does it make more sene to use a proper full text search
>    engine(lucene/solr/elasticsearch) to index the message body, does that
>    sound like a better idea?
>

Often yes.  For big data especially, this is where ElasticSearch accels.

>
> would be great someone experienced can share some stories here.
>
> -Shengjie Min
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB