Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # dev - Review Request 12451: SQOOP-1049: Sqoop2: Record not imported if partition column value is NULL


Copy link to this message
-
Re: Review Request 12451: SQOOP-1049: Sqoop2: Record not imported if partition column value is NULL
Mengwei Ding 2013-07-12, 00:06


> On July 11, 2013, 9:54 p.m., Jarek Cecho wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java, line 184
> > <https://reviews.apache.org/r/12451/diff/1/?file=319957#file319957line184>
> >
> >     I'm concerned a bit of using count() aggregate function as it might lead to another full table scan which might significantly hurt performance. Maybe we could make the ability for checking nulls in the split by column optional?
>
> Mengwei Ding wrote:
>     Yes, this is an issue. I will use 'count(1)' instead.
>
> Jarek Cecho wrote:
>     I'm afraid that count(1) won't help either. In case that the database engine is not storing the precise number of columns (such as InnoDB in MySQL), queries of type "select count(*/1) from table" will result in full table scan, which might be quite heavy operation.

Yes, I did some research just now. For null values, they won't be indexed in database. Thus, to retrieve all null values, it has to scan the whole table. I just thought out another idea that we don't necessarily need to check whether the column has nulls, instead we could add an extra partition for nulls at any time. In this way, we reduce the full table scan to one, since we cannot avoid full table scan. By the way, what do you mean by checking nulls in the split by column optional ?
- Mengwei
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12451/#review23028
-----------------------------------------------------------
On July 10, 2013, 7:02 p.m., Mengwei Ding wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12451/
> -----------------------------------------------------------
>
> (Updated July 10, 2013, 7:02 p.m.)
>
>
> Review request for Sqoop and Jarek Cecho.
>
>
> Bugs: SQOOP-1049
>     https://issues.apache.org/jira/browse/SQOOP-1049
>
>
> Repository: sqoop-sqoop2
>
>
> Description
> -------
>
> commit 47e73c30b49be0168459d76bf8993205c7a4f4fc
> Author: Mengwei Ding <[EMAIL PROTECTED]>
> Date:   Wed Jul 10 11:41:05 2013 -0700
>
>     SQOOP-1049: Sqoop2: Record not imported if partition column value is NULL
>
> :100644 100644 abcc89d... a940d15... M connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorConstants.java
> :100644 100644 671bb4a... d331ae8... M connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorError.java
> :100644 100644 96818ba... 357fefb... M connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
> :100644 100644 4401800... ff80ed3... M connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
>
>
> Diffs
> -----
>
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorConstants.java abcc89d
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorError.java 671bb4a
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java 96818ba
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java 4401800
>
> Diff: https://reviews.apache.org/r/12451/diff/
>
>
> Testing
> -------
>
> Have done a manual test, in which I successfully import a table with some null values in partition column.
>
>
> Thanks,
>
> Mengwei Ding
>
>