-Re: What's the best approach to search in HBase?
Otis Gospodnetic 2011-06-18, 06:27
HBasene is dead. Watch HBASE-3529.
We're hiring HBase / Hadoop / Hive / Mahout engineers with interest in Big Data Mining and Analytics
From: "Hiller, Dean x66079" <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>Sent: Friday, June 17, 2011 4:21 PM
>Subject: RE: What's the best approach to search in HBase?
>What about using Hbasene....is it pretty good....looks just like a distributed Lucene and the same api and everything?
>From: Mark Kerzner [mailto:[EMAIL PROTECTED]]
>Sent: Wednesday, June 15, 2011 10:10 PM
>To: [EMAIL PROTECTED]
>Subject: Re: What's the best approach to search in HBase?
>Thank you, everybody. I summarized your advice here,
>http://shmsoft.blogspot.com/2011/06/search-in-ediscovery.html, because I
>need it for my open source eDiscovery, and now just need to try it all :)
>On Mon, Jun 6, 2011 at 11:18 AM, Buttler, David <[EMAIL PROTECTED]> wrote:
>> I store over 500M documents in HBase, and index using Solr with dynamic
>> fields. This gives you tremendous flexibility to do the type of queries you
>> are looking for -- and to make them simple and intuitive via a faceted
>> However, there was quite a bit of software that we had to write to get
>> things going, and I can neither release all of it open source, or support
>> other people using it. If I had to start again, I would seriously look at
>> solutions like elastic search and lily.
>> -----Original Message-----
>> From: Mark Kerzner [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, June 03, 2011 5:57 PM
>> To: HBase Discussion Group
>> Subject: What's the best approach to search in HBase?
>> I need to store, say, 10M-100M documents, with each document having say 100
>> fields, like author, creation date, access date, etc., and then I want to
>> ask questions like
>> give me all documents whose author is like abc**, and creation date any
>> in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions,
>> matching a list of some keywords.
>> What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan
>> and compare of every record?
>> Thanks a bunch!
>This message and any attachments are intended only for the use of the addressee and
>may contain information that is privileged and confidential. If the reader of the
>message is not the intended recipient or an authorized representative of the
>intended recipient, you are hereby notified that any dissemination of this
>communication is strictly prohibited. If you have received this communication in
>error, please notify us immediately by e-mail and delete the message and any
>attachments from your system.