Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format


+
Hari Shreedharan 2013-07-25, 04:18
+
Hari Shreedharan 2013-07-26, 00:04
+
Hari Shreedharan 2013-07-27, 01:46
+
Hari Shreedharan 2013-07-27, 04:11
+
Hari Shreedharan 2013-08-01, 03:41
+
Venkat Ranganathan 2013-08-01, 04:42
+
Hari Shreedharan 2013-08-01, 18:34
+
Venkat Ranganathan 2013-08-01, 18:56
+
Hari Shreedharan 2013-08-01, 20:24
Copy link to this message
-
Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review25304
-----------------------------------------------------------
Hi Hari,
thank you again for working on this JIRA and please accept my apologies for the late response. I've deep dived into the patch and I do have couple of comments and questions.
common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49646>

    Extreme nit: Read data from the execution framework...
    
    (not necessary MR)

common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49647>

    Extreme nit: Read data from execution framework...

common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49648>

    Nit: Read data from execution framework as a native format.
    
    This should be independent native format right?

common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment49649>

    Extreme nit: Write an array of objects into the execution framework...

common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment49650>

    Extreme nit: Write data into execution framework....

common/src/main/java/org/apache/sqoop/schema/type/Column.java
<https://reviews.apache.org/r/12936/#comment49651>

    Can we add descriptive java doc describing what exactly are expecting to validate?

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment50066>

    Those imports seems to be unused.

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment50067>

    We should also take into account exportJobConfiguration.table.schema when building the query.

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
<https://reviews.apache.org/r/12936/#comment50068>

    Let's clean up unused imports after this refactoring.

connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
<https://reviews.apache.org/r/12936/#comment49652>

    This method seems to be generated automatically, but it seems to be removing the fail() call, is it intentional?

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49656>

    The wiki [1] is also saying that the 0x5C should be substituted.
    
    1: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49658>

    Similarly as above, it seems that we are missing the \\ replacement for 0x5C.

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50050>

    Is it wise to consume the exception here without any counters or other reporting except of log message?
    
    It also seems that other parts of the code are not null safe, so this error handling will most likely just cause NPE somewhere else.

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49660>

    The setData() method is conditionally running the validations, are they intentionally skipped here in setTextData()?

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49661>

    I'm afraid that String.split() won't work correctly for following example input:
    
    1,"Hi, here is Jarcec"
    
    That contains two columns, but three columns will be created when splitting by comma.

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50051>

    Nit: Please pass the correct array size instead of the zero.

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50052>

    I would suggest to be strict here and accept only upper case variant of NULL.

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50053>

    I would suggest to die quickly for unsupported data types rather than just silently pass them as a un-escaped strings.

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50055>

    Is there a reason why to do our own join rather than StringUtils.join()?

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50056>

    Is expected and desirable that user can't reset schema to NULL?

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50035>

    Nit: Would it make more sense to have IntermediateDataFormat (without type here) or CSVIntermediateDataFormat since we are not allowing nothing else?

connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50036>

    This will convert the binary string to an array of 0 and 1, right? The intermediate data format is suggesting the MySQLdump approach that is serializing the bytes as they are though. The 0 and 1 seems to be quite ineffici
+
Jarek Cecho 2013-07-31, 23:25
+
Hari Shreedharan 2013-08-01, 03:40