Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Lucene instead of HFiles?


Copy link to this message
-
Re: Lucene instead of HFiles?
Hi Renaud,

On Fri, Oct 5, 2012 at 4:48 AM, Renaud Delbru <[EMAIL PROTECTED]> wrote:
> Hi,
>
> With respect to point 3, I know there is a new codec in Lucene 4.0 for
> append-only filesystem such as hdfs (LUCENE-2373)

Yeah.  Though I think nobody wants to search indices directly in HDFS
for performance reasons.

> Also, it would also depend on the use case. At the moment, for storing data,
> I would expect HFile to be much more efficient in term of compression than
> Lucene file system (in fact, there is no real comnpression, apart by
> compressing yourself the field byte stream before storing it). There is some
> work to try to make Lucene more efficient for small and medium sized fields
> (LUCENE-4226 - block-style compression and storing), but I think HFile is
> far more optimised for this task.

I wouldn't know... though I was under the impression there has been
other work around packing things tightly both on disk and in memory.
Check http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
... slide 16, etc.

> In fact, another interesting idea would be to investigate the use of HFile
> as a StoredFieldFormat in Lucene. Efficient storage of data in Lucene is
> imho quite a missing feature.

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html
> On 05/10/12 07:36, Adrien Mogenet wrote:
>>
>> "Don't bother trying this in production" ;-)
>>
>> 1. Are you sure lookup by key are faster ?
>> 2. Updating Lucene files in a lock-free maneer and ensuring good
>> concurrency can be a bit tricky
>> 3. AFAIK, Lucene files don't fit in HDFS and thus another distributed
>> storage is required. Katta does not look as powerful as Hadoop.
>>
>> On Fri, Oct 5, 2012 at 5:34 AM, Otis Gospodnetic
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> Has anyone attempted using Lucene instead of HFiles (see
>>> https://twitter.com/otisg/status/254047978174701568 )?
>>>
>>> Is that a completely crazy, bad, would-never-work,
>>> don't-bother-trying-this-at-home, it's-too-late-go-to-sleep idea? Or
>>> not?
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Search Analytics - http://sematext.com/search-analytics/index.html
>>> Performance Monitoring - http://sematext.com/spm/index.html
>>
>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB