Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Review Request 18936: HIVE-6430 MapJoin hash table has large memory overhead


Copy link to this message
-
Re: Review Request 18936: HIVE-6430 MapJoin hash table has large memory overhead

This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18936/#review36680

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
<https://reviews.apache.org/r/18936/#comment67786>

    This would be much simpler if you split up the details into 2 groups.
    
    1) Finding the key
    2) Finding the value(s)
    
    Because #1 is well understood for closed hashtables.
    
    And #2 is where all the complexity is for this impl, with the multi value linked list via offsets implementation.

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
<https://reviews.apache.org/r/18936/#comment67737>

    This should be an IllegalArgumentException - we don't run asserts in production.

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
<https://reviews.apache.org/r/18936/#comment67740>

    Quadriatic probing is much nicer for collisions.

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
<https://reviews.apache.org/r/18936/#comment67742>

    if cmpLength != keylength comparison - cannot be equal if they are not byte-for-byte equal, right?

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/18936/#comment67748>

    why is there an init()?

serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java
<https://reviews.apache.org/r/18936/#comment67736>

    Comment eaten up in diff?
- Gopal V
On March 8, 2014, 12:31 a.m., Sergey Shelukhin wrote: