Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format


Copy link to this message
-
Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format
Jarek Cecho 2013-07-31, 23:25

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review24352
-----------------------------------------------------------
Hi Hari,
thank you very much for working on this significant feature! I've done high level review and I think that we are in a good shape - good work here! Couple of high level notes:
common/src/main/java/org/apache/sqoop/etl/io/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment48198>

    I would much rather see such class(es) in connector sdk than in common module (that is for example shared with client).

common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment48200>

    Nit: Can we add java docs describing purpose of each method?

common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment48201>

    Nit: Can we add java docs describing purpose of each method?

common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment48202>

    Nit: "the the"

common/src/main/java/org/apache/sqoop/etl/io/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment48203>

    Nit: Please add the java docs here as well.

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment48244>

    I'm afraid that the schema generation for query based export will fail and I'm not sure how to implement that, so I we might need to remove that functionality.

connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment48247>

    Similar code is also in importer, so maybe we could refactore that into some Util/Helper class?

core/src/main/java/org/apache/sqoop/framework/FrameworkConstants.java
<https://reviews.apache.org/r/12936/#comment48249>

    Nit: This property seems to be used mainly within the Execute/Submission engine, so it's better placed in JobConstants instead.

execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
<https://reviews.apache.org/r/12936/#comment48248>

    Can we throw more meaning full exception here? :-)

pom.xml
<https://reviews.apache.org/r/12936/#comment48197>

    We already have guava dependency in Hadoop 100 profile. If the goal here is to make it explicit, I would suggest to move it out from there and use the property guava.version to specify the version.
Jarcec

- Jarek Cecho
On July 27, 2013, 4:11 a.m., Hari Shreedharan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
>
> (Updated July 27, 2013, 4:11 a.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
>
>
> Repository: sqoop-sqoop2
>
>
> Description
> -------
>
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop.
>
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch.
>
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.