Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - quick question about the new SE iface


Copy link to this message
-
Re: quick question about the new SE iface
Jacques Nadeau 2013-04-20, 23:00
On Sat, Apr 20, 2013 at 2:39 PM, David Alves <[EMAIL PROTECTED]> wrote:

> Hi
>
>         I'm porting the region level HBase SE to the new SE iface and I
> have a couple of questions.
>         1- about the method: public ListMultimap<ReadEntry,
> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
>
>         when does it happen that a read entry gets assigned more that one
> drillbits?
>         in terms of hbase I can see the case where multiple read entries
> get assigned to the same drillbit (co-located regions) but I can't envision
> a case where the same read entry (usually corresponding to a shard or
> partition) gets assigned to multiple drillbits. when can that happen?
>

Best example is probably block replica locations in HDFS have multiple
possible endpoints.

>
>         2- with regard to off-heap storage and underlying SE co-location
>
>         this is not really a doubt, just checking that my reasoning is
> correct before.
>
>         for co-located underlying SE and Drillbit's we should use
> off-heap, shared memory for IPC when possible, correct?
>         Specifically I'm investigating the possibility of having HBase
> store region scan data directly off heap and making the results from hbase
> contain a set references to aligned shared memory locations.
>         I'm not sure I'll be implementing this immediately but I'd like to
> design accounting for it if that is the idea.
>         Also this means that SE's must work in two modes: co-located with
> shared memory and remote with sockets. We'd then have the
>         Jacques: I'm sure you've put some thought to the underlying
> mechanics on how to accomplish this, could you share some quick
> ideas/references?
>

The challenge is separate JVMs don't have a nice way to share memory.  The
simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
performance impact of this complexity.  I think the Java Chronicle,
HugeCollections or VanillaJava stuff by Peter Lawrey has played with this.
 There isn't a lot of work in the space.  Other interesting info:
http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html.
Yes, this does mean that an SE may need to use two different mechanisms to
interact: one local and one remote/fallback.

J