Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> For HBase compactions - Lucene's IO impact reduction code


Copy link to this message
-
Re: For HBase compactions - Lucene's IO impact reduction code
Hi Lars,

Yeah, I was really thinking more about this part being useful for HBase:
"Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO, by using direct IO. This ensures that a merge won't evict hot pages used by searches."

Here it is: https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/misc/src/java/org/apache/lucene/store/NativeUnixDirectory.java

And it looks this requires something called NativePosixUtil.cpp which lives in Lucene.  Here is a reference: http://fossies.org/dox/apache-solr-3.6.0-src/NativePosixUtil_8cpp.html 

Judging by the lack of discussion around this I'm guessing this is not a big enough itch - either because this is not an actual problem or because we have no way of knowing how much damage compactions are doing to OS buffers.

But you can see some agreements around the above actually being attractive - http://search-hadoop.com/m/waHGf0r3K42 -- from February 2011.

Otis
----
Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 

>________________________________
> From: Lars George <[EMAIL PROTECTED]>
>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>Sent: Saturday, July 7, 2012 3:01 AM
>Subject: Re: For HBase compactions - Lucene's IO impact reduction code
>
>Hi Otis,
>
>Throttling I think is a less needed feature as we typically struggle to keep up with the compaction queue under load. Reducing background noise caused by compactions is more an exercise of tuning the compaction algorithm itself. That is still somewhat of a black art it seems.
>
>As for the OS buffer bypassing, Todd did some work along these lines in HDFS, which helped speeding up HBase (for CDH this went into CDH3u4). Not sure if it is really the same or not, so I leave this for someone else to comment on.
>
>But indeed interesting ideas and should be discussed thoroughly.
>
>Lars
>
>On Jul 7, 2012, at 7:49, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Here is something that may be of interest to HBase:
>>
>> Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene developers, wrote a really nice post about new things in this version of Lucene.  The part that I think is interesting for HBase, and that HBase devs may want to look at (and borrow to use with compactions) is this:
>>
>> Reducing merge IO impact
>>
>> Merging (consolidating many small segments into a single big one) is a very IO and CPU intensive operation which can easily interfere with ongoing searches. In 4.0.0 we now have two ways to reduct this impact:
>>    * Rate-limit the IO caused by ongoing merging, by callingFSDirectory.setMaxMergeWriteMBPerSec.
>>
>>
>>    * Use the new NativeUnixDirectory which bypasses the OS's IO cache for all merge IO, by using direct IO. This ensures that a merge won't evict hot pages used by searches. (Note that there is also a native WindowsDirectory, but it does not yet use direct IO during merging... patches welcome!).
>>
>> Remember to also set swappiness to 0 on Linux if you want to maximize search responsiveness.
>>
>> More generally, the APIs that open an input or output file (Directory.openInput andDirectory.createOutput) now take an IOContext describing what's being done (e.g., flush vs merge), so you can create a custom Directory that changes its behavior depending on the context.
>>
>> These changes were part of a 2011 Google Summer of Code project (thank you Varun!). 
>>
>> 
>>
>> Thoughts?
>>
>> Otis
>> ----
>> Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm
>
>
>