Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - quick question about the new SE iface


Copy link to this message
-
Re: quick question about the new SE iface
David Alves 2013-04-20, 23:21
Thanks for the quick reply and for the pointers.
wrt to question one, now thinking about it that opens the door for some cool optimizations.

wrt to question two I found a couple of interesting references using MMAP'd tmpfs, was now looking into the way JCuda uses pinned memory.
i'll also look into the Peter Lawrey references.

best
david

On Apr 20, 2013, at 6:00 PM, Jacques Nadeau <[EMAIL PROTECTED]> wrote:

> On Sat, Apr 20, 2013 at 2:39 PM, David Alves <[EMAIL PROTECTED]> wrote:
>
>> Hi
>>
>>        I'm porting the region level HBase SE to the new SE iface and I
>> have a couple of questions.
>>        1- about the method: public ListMultimap<ReadEntry,
>> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
>>
>>        when does it happen that a read entry gets assigned more that one
>> drillbits?
>>        in terms of hbase I can see the case where multiple read entries
>> get assigned to the same drillbit (co-located regions) but I can't envision
>> a case where the same read entry (usually corresponding to a shard or
>> partition) gets assigned to multiple drillbits. when can that happen?
>>
>
> Best example is probably block replica locations in HDFS have multiple
> possible endpoints.
>
>
>
>>
>>        2- with regard to off-heap storage and underlying SE co-location
>>
>>        this is not really a doubt, just checking that my reasoning is
>> correct before.
>>
>>        for co-located underlying SE and Drillbit's we should use
>> off-heap, shared memory for IPC when possible, correct?
>>        Specifically I'm investigating the possibility of having HBase
>> store region scan data directly off heap and making the results from hbase
>> contain a set references to aligned shared memory locations.
>>        I'm not sure I'll be implementing this immediately but I'd like to
>> design accounting for it if that is the idea.
>>        Also this means that SE's must work in two modes: co-located with
>> shared memory and remote with sockets. We'd then have the
>>        Jacques: I'm sure you've put some thought to the underlying
>> mechanics on how to accomplish this, could you share some quick
>> ideas/references?
>>
>
> The challenge is separate JVMs don't have a nice way to share memory.  The
> simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
> performance impact of this complexity.  I think the Java Chronicle,
> HugeCollections or VanillaJava stuff by Peter Lawrey has played with this.
> There isn't a lot of work in the space.  Other interesting info:
> http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html.
>
>
> Yes, this does mean that an SE may need to use two different mechanisms to
> interact: one local and one remote/fallback.
>
> J