Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # dev - Review Request 12089: SQOOP-1101 Sqoop2: Basic schema: Provide implementation of the proposed data types


+
Jarek Cecho 2013-06-25, 21:42
+
Abraham Elmahrek 2013-06-25, 23:27
+
Abraham Elmahrek 2013-06-26, 02:44
+
Jarek Cecho 2013-06-25, 23:47
+
Venkat Ranganathan 2013-06-26, 17:33
+
Abraham Elmahrek 2013-06-26, 08:31
+
Venkat Ranganathan 2013-06-26, 07:43
+
Abraham Elmahrek 2013-06-26, 18:44
Copy link to this message
-
Re: Review Request 12089: SQOOP-1101 Sqoop2: Basic schema: Provide implementation of the proposed data types
Abraham Elmahrek 2013-06-28, 22:07


> On June 25, 2013, 11:27 p.m., Abraham Elmahrek wrote:
> > Overall, looks good. I am mostly unsure about the use of 31 as the only prime number separating hash code values.
>
> Jarek Cecho wrote:
>     Hi Abe,
>     thank you very much for the review, appreciated! All the hashCode methods were generated by my IDE (IntelliJ IDEA) including the magical constant 31. I assume that this is best practice, however please correct me if I'm wrong.
>
> Abraham Elmahrek wrote:
>     The number 31 seems arbitrary. My understanding is that hash codes with multiple instance members involved require some level of separation. By consistently using the number 31, there is a small chance that there will be some overlap between the hash codes produced by the members. It should increase the chance of a collision... the amount it increases it by is likely small though if the members are strings. If the members are Boolean, the overlap is guaranteed... IE: class A: member1 = Boolean,  member1 = Boolean, hashCode = func { result = 0; if member1 != null, then result = member1.hashCode(); if member2 != null, then result = 31*result + member2.hashCode(); }. NOTE: Boolean in java has two values for its hash code.
>
> Venkat Ranganathan wrote:
>     I think the basic idea is to have an hash code that is somewhat similar to member1*31^(n-1) + member2*31^(n-2) + ... + member[n] using int arithmetic (where member1 is the hashcode of member object 1 for example computed similarly).  But it is done with integer arithmetic.   I have seen other numbers used, like say 37, but may be that has better dispersion but still will be an arbitrary number, right?
>
> Abraham Elmahrek wrote:
>     Venkat,
>     That seems right. It's just the implementation is a bit skeptical given null values in members effectively zero out an iteration providing more opportunity for collisions. IE the scenario I've stated above.
>    
>     Anyways, I don't think it matters too much. If you guys are fine with it. I am too.
>
> Venkat Ranganathan wrote:
>     I see what you are saying.  But is it not that initializing result to a non-zero value is what is needed here.   That is while computing hash code, initialize result to, say 1.
>    
>     result = 1
>     result = result * 31 + member1
>     result = result * 31 + member2
>    
>     Thanks
>
>
> Abraham Elmahrek wrote:
>     Ah yeah. Good point. Nevermind!
>
> Jarek Cecho wrote:
>     Thank you Abe and Venkat for the discussion. Is my understanding that current implementation of hashCode() method is good enough correct? Or would you prefer if I would do some changes there?

Seems good enough if the range is large enough! IE: having string members.
- Abraham
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12089/#review22380
-----------------------------------------------------------
On June 25, 2013, 9:42 p.m., Jarek Cecho wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12089/
> -----------------------------------------------------------
>
> (Updated June 25, 2013, 9:42 p.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-1101
>     https://issues.apache.org/jira/browse/SQOOP-1101
>
>
> Repository: sqoop-sqoop2
>
>
> Description
> -------
>
> This part of SQOOP-1073 is adding classes representing the data types proposed in SQOOP-515.
>
>
> Diffs
> -----
>
>   common/pom.xml 2921800
>   common/src/main/java/org/apache/sqoop/schema/Schema.java PRE-CREATION
>   common/src/main/java/org/apache/sqoop/schema/SchemaError.java PRE-CREATION
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractComplexType.java PRE-CREATION
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractDateTime.java PRE-CREATION
>   common/src/main/java/org/apache/sqoop/schema/type/AbstractNumber.java PRE-CREATION
+
Jarek Cecho 2013-06-28, 20:45
+
Venkat Ranganathan 2013-06-28, 22:24
+
Jarek Cecho 2013-06-29, 21:14
+
Jarek Cecho 2013-06-29, 21:14
+
Venkat Ranganathan 2013-07-01, 02:58