Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format

Copy link to this message
Re: Review Request 12936: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format
Hari Shreedharan 2013-08-01, 18:34

> On Aug. 1, 2013, 4:42 a.m., Venkat Ranganathan wrote:
> > common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java, line 45
> > <https://reviews.apache.org/r/12936/diff/5/?file=332241#file332241line45>
> >
> >     Do you think this should be writeContent (or conversely the method in DataReader should be changed to readRecord instead of Content?)


Maybe I can make the javadoc clearer, but the idea of having readContent and writeContent in the DataReader/DataWriter and the IntermediateDataFormat is that if we allow the connector to choose a IntermediateDataFormat - the connector can read/write in the native format and (once we make the serialization part in the OutputFormat pluggable) we would be able to have the serializer also read/write in the connector's native format. In that case, it is possible that the native format might be efficiently able to put in several records in one call itself - which is why I named it as such (so all others will be record oriented while this method is not). Does that make the intent clearer?
- Hari
This is an automatically generated e-mail. To reply, visit:
On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> (Updated Aug. 1, 2013, 3:41 a.m.)
> Review request for Sqoop.
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> Repository: sqoop-sqoop2
> Description
> -------
> Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so that we can use the intermediate data format independent of (current) Hadoop.
> I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the current patch.
> I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.
> Diffs
> -----
>   common/pom.xml db11b5b
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java e0da80f
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java 7212843
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java PRE-CREATION
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java aa1c4ff
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java a7ed6ba
>   connector/connector-sdk/pom.xml 4056e14
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java PRE-CREATION
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java PRE-CREATION
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java PRE-CREATION