Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> quick question about the new SE iface


Copy link to this message
-
Re: quick question about the new SE iface
On Sat, Apr 20, 2013 at 2:39 PM, David Alves <[EMAIL PROTECTED]> wrote:

> Hi
>
>         I'm porting the region level HBase SE to the new SE iface and I
> have a couple of questions.
>         1- about the method: public ListMultimap<ReadEntry,
> DrillbitEndpoint> getReadLocations(Collection<ReadEntry> entries)
>
>         when does it happen that a read entry gets assigned more that one
> drillbits?
>         in terms of hbase I can see the case where multiple read entries
> get assigned to the same drillbit (co-located regions) but I can't envision
> a case where the same read entry (usually corresponding to a shard or
> partition) gets assigned to multiple drillbits. when can that happen?
>

Best example is probably block replica locations in HDFS have multiple
possible endpoints.

>
>         2- with regard to off-heap storage and underlying SE co-location
>
>         this is not really a doubt, just checking that my reasoning is
> correct before.
>
>         for co-located underlying SE and Drillbit's we should use
> off-heap, shared memory for IPC when possible, correct?
>         Specifically I'm investigating the possibility of having HBase
> store region scan data directly off heap and making the results from hbase
> contain a set references to aligned shared memory locations.
>         I'm not sure I'll be implementing this immediately but I'd like to
> design accounting for it if that is the idea.
>         Also this means that SE's must work in two modes: co-located with
> shared memory and remote with sockets. We'd then have the
>         Jacques: I'm sure you've put some thought to the underlying
> mechanics on how to accomplish this, could you share some quick
> ideas/references?
>

The challenge is separate JVMs don't have a nice way to share memory.  The
simplest way is probably using MMAP'd tmpfs.  We'd have to evaluate the
performance impact of this complexity.  I think the Java Chronicle,
HugeCollections or VanillaJava stuff by Peter Lawrey has played with this.
 There isn't a lot of work in the space.  Other interesting info:
http://javaforu.blogspot.com/2011/09/offloading-data-from-jvm-heap-little.html.
Yes, this does mean that an SE may need to use two different mechanisms to
interact: one local and one remote/fallback.

J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB