Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> [jira] [Created] (SQOOP-1277) Import not splitted when using --boundary-query


Copy link to this message
-
[jira] [Created] (SQOOP-1277) Import not splitted when using --boundary-query
Porati Sébastien created SQOOP-1277:
---------------------------------------

             Summary: Import not splitted when using --boundary-query
                 Key: SQOOP-1277
                 URL: https://issues.apache.org/jira/browse/SQOOP-1277
             Project: Sqoop
          Issue Type: Bug
          Components: hive-integration
    Affects Versions: 1.4.4
         Environment: Amazon AWS
            Reporter: Porati Sébastien
            Priority: Critical
I try to import Mysql Data into a hive table. I would like to use a custom boundary query. Results : sqoop does not split the load into multiple query and the import takes too long time.

My creation command :
{code:none}
sqoop job -Dsqoop.metastore.client.record.password=true \
    --create importJobName -- import \
    --connect jdbc:mysql://some_jdbc_pram \
    --username user_name \
    --password MyPassword \
    --table my_table \
    --columns "collect_id,collected_data_id,value" \
    --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'" \
    --split-by column_name \
    --num-mappers X \
    --hive-import \
    --hive-overwrite \
    --hive-table hivedb.hibetable --as-textfile --null-string \\\\N --null-non-string \\\\N
{code}
    
The following message is displayed :
{code:none}
WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'; splits may not partition data.
{code}

I tried to add the $CONDITION to the creation command
{code:none}
--boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND \$CONDITION" \
{code}

But the job execution failed:
{code:none}
INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND $CONDITIONS
INFO mapred.JobClient: Cleaning up the staging area hdfs://10.34.140.108:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201401311408_0025
ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Unknown column '$CONDITIONS' in 'where clause'
{code}

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)