Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Hbase filter-SubstringComparator vs full text search indexing


Copy link to this message
-
Re: Hbase filter-SubstringComparator vs full text search indexing
Otis Gospodnetic 2012-09-10, 17:41
Hello,

If you need to scan lots of log messages and process them use HBase
(or Hive or Pig or simply HDFS+MR)
If you need to query your data set by anything in the text of the log
message, use ElasticSearch or Solr 4.0 or Sensei or just Lucene.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
On Mon, Sep 10, 2012 at 10:24 AM, Shengjie Min <[EMAIL PROTECTED]> wrote:
> In my case, I have all the log events stored in HDFS/hbase in this format:
>
> timestamp | priority | category | message body
>
> Given I have only 4 fields here, that limits my queries to only against
> these four. I am thinking about more advanced search like full text search
> the message body. well, mainly substring query against message body.
>
>    1.
>
>    Has anybody tried to use Hbase SubstringComparator? How does it perform,
>    with reasonable huge amount of data, can it still provide us the real time
>    response capability?
>    2.
>
>    In my case, does it make more sene to use a proper full text search
>    engine(lucene/solr/elasticsearch) to index the message body, does that
>    sound like a better idea?
>
> would be great someone experienced can share some stories here.
>
> -Shengjie Min