Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Lucene instead of HFiles?


Copy link to this message
-
Re: Lucene instead of HFiles?
Hi Otis,

My initial reaction was, "interesting idea". On second thoughts though I do not see how this makes more sense compared to what we have now. HFiles combined with Bloom filters are fast to look up anyways. Adding Lucene as another "Storage Engine" (getting us close to Voldemort or MySQL with replaceable storage backends) does seem to not add any value, and more so, might even have a few drawbacks. Especially range scans will suffer, as HFiles and their block oriented layout plus caching makes for really fast I/O. Lucene is for search, not xyzbytes of data transfers. And simply replacing the block index and Blooms with Lucene is also I think overkill. Just saying.

Lars

On Oct 5, 2012, at 5:34 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Has anyone attempted using Lucene instead of HFiles (see
> https://twitter.com/otisg/status/254047978174701568 )?
>
> Is that a completely crazy, bad, would-never-work,
> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or
> not?
>
> Thanks,
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB