Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Review Request 16812: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format


Copy link to this message
-
Re: Review Request 16812: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format


> On Jan. 20, 2014, 5:33 p.m., Jarek Cecho wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java, lines 63-75
> > <https://reviews.apache.org/r/16812/diff/2/?file=421130#file421130line63>
> >
> >     I'm afraid that this won't work correctly for the case that user will supplement insert query instead of table name.
> >    
> >     I'm thinking about utilizing PreparedStement.getMetaData() to get those information for arbitrary INSERT SQL, but I'm not sure about that.
> >
>
> Abraham Elmahrek wrote:
>     Do you mean a free form query? I completely agree. If we use PreparedStatement, then we don't have to execute the query to get the metadata in theory.
>    
>     Are we assuming that all datasources Sqoop interfaces with will have a JDBC client? Also, we'll have to go through the connector's executor somehow.

The Generic JDBC Connector is meant to be used only for JDBC compliant databases, so non JDBC based sources will have to came up with their own connector. I'm still not sure if the PreparedStatement will work for example on query "INSERT INTO table VALUES(?, ?,?)" that might be entered by user in the export case. Perhaps we should disable export with custom query for now and re-add it later after we figure out how to get schema for it. What do you think Abe?
> On Jan. 20, 2014, 5:33 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java, line 52
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line52>
> >
> >     Nit: Can we sort the bytes?
>
> Abraham Elmahrek wrote:
>     Actually, order matters here. If a newline (0x0A) is replaced with \n first, then when we replace all \ with \\, the \n will become \\n. This would legitimately overlap with the \n string.

Good catch, I've missed that!
> On Jan. 20, 2014, 5:33 p.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java, line 216
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line216>
> >
> >     Nit: Modify to SqoopException + provide meaningful error message.
>
> Abraham Elmahrek wrote:
>     Worry about validation later.

Make sense, let's just ensure that we will create follow up JIRAs for it later.
- Jarek
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16812/#review32293
-----------------------------------------------------------
On Jan. 13, 2014, 1:17 a.m., Abraham Elmahrek wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16812/
> -----------------------------------------------------------
>
> (Updated Jan. 13, 2014, 1:17 a.m.)
>
>
> Review request for Sqoop.
>
>
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
>
>
> Repository: sqoop-sqoop2
>
>
> Description
> -------
>
> Previous review from hari is in first iteration of this review only.
>
> commit 1c5122611fbd4a46a629421f7c55746cfc14f136
> Author: Hari Shreedharan <[EMAIL PROTECTED]>
> Date:   Fri Jan 10 15:07:08 2014 -0800
>
>     SQOOP-777. Sqoop2: Pluggable Intermediate Data Format
>    
>     Implemented a pluggable intermediate data format that decouples the internal representation of the data from the connector and the output formats. Connectors can choose to implement and support a format that is more efficient for them. Also separated the SqoopWritable so th
>    
>     I ran a full build - all tests including integration tests pass. I have not added any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running this on an actual cluster - so things may be broken. I'd like some initial feedback based on the c
>    
>     I also implemented escaping of characters. There is some work remaining to support binary format, but it is mostly integration, the basic implementation is in place.