Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # dev >> [jira] [Created] (SQOOP-1277) Import not splitted when using --boundary-query


Copy link to this message
-
[jira] [Created] (SQOOP-1277) Import not splitted when using --boundary-query
Porati Sébastien created SQOOP-1277:
---------------------------------------

             Summary: Import not splitted when using --boundary-query
                 Key: SQOOP-1277
                 URL: https://issues.apache.org/jira/browse/SQOOP-1277
             Project: Sqoop
          Issue Type: Bug
          Components: hive-integration
    Affects Versions: 1.4.4
         Environment: Amazon AWS
            Reporter: Porati Sébastien
            Priority: Critical
I try to import Mysql Data into a hive table. I would like to use a custom boundary query. Results : sqoop does not split the load into multiple query and the import takes too long time.

My creation command :
{code:none}
sqoop job -Dsqoop.metastore.client.record.password=true \
    --create importJobName -- import \
    --connect jdbc:mysql://some_jdbc_pram \
    --username user_name \
    --password MyPassword \
    --table my_table \
    --columns "collect_id,collected_data_id,value" \
    --boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'" \
    --split-by column_name \
    --num-mappers X \
    --hive-import \
    --hive-overwrite \
    --hive-table hivedb.hibetable --as-textfile --null-string \\\\N --null-non-string \\\\N
{code}
    
The following message is displayed :
{code:none}
WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name'; splits may not partition data.
{code}

I tried to add the $CONDITION to the creation command
{code:none}
--boundary-query "SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND \$CONDITION" \
{code}

But the job execution failed:
{code:none}
INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min_value, max_value FROM sqoop_boundaries WHERE key_name = 'key.name' AND $CONDITIONS
INFO mapred.JobClient: Cleaning up the staging area hdfs://10.34.140.108:9000/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_201401311408_0025
ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Unknown column '$CONDITIONS' in 'where clause'
{code}

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB