-Re: Lucene instead of HFiles?
Otis Gospodnetic 2012-10-06, 02:31
On Fri, Oct 5, 2012 at 4:48 AM, Renaud Delbru <[EMAIL PROTECTED]> wrote:
> With respect to point 3, I know there is a new codec in Lucene 4.0 for
> append-only filesystem such as hdfs (LUCENE-2373)
Yeah. Though I think nobody wants to search indices directly in HDFS
for performance reasons.
> Also, it would also depend on the use case. At the moment, for storing data,
> I would expect HFile to be much more efficient in term of compression than
> Lucene file system (in fact, there is no real comnpression, apart by
> compressing yourself the field byte stream before storing it). There is some
> work to try to make Lucene more efficient for small and medium sized fields
> (LUCENE-4226 - block-style compression and storing), but I think HFile is
> far more optimised for this task.
I wouldn't know... though I was under the impression there has been
other work around packing things tightly both on disk and in memory.
... slide 16, etc.
> In fact, another interesting idea would be to investigate the use of HFile
> as a StoredFieldFormat in Lucene. Efficient storage of data in Lucene is
> imho quite a missing feature.
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
> On 05/10/12 07:36, Adrien Mogenet wrote:
>> "Don't bother trying this in production" ;-)
>> 1. Are you sure lookup by key are faster ?
>> 2. Updating Lucene files in a lock-free maneer and ensuring good
>> concurrency can be a bit tricky
>> 3. AFAIK, Lucene files don't fit in HDFS and thus another distributed
>> storage is required. Katta does not look as powerful as Hadoop.
>> On Fri, Oct 5, 2012 at 5:34 AM, Otis Gospodnetic
>> <[EMAIL PROTECTED]> wrote:
>>> Has anyone attempted using Lucene instead of HFiles (see
>>> https://twitter.com/otisg/status/254047978174701568 )?
>>> Is that a completely crazy, bad, would-never-work,
>>> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or
>>> Search Analytics - http://sematext.com/search-analytics/index.html
>>> Performance Monitoring - http://sematext.com/spm/index.html