Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Re: problems importing from oracle\


Copy link to this message
-
Re: problems importing from oracle\
Hi Bruce,

On Mon, Mar 5, 2012 at 6:50 PM, Bruce Bian <[EMAIL PROTECTED]> wrote:
> any feedback on this?Shall I create a jira issue on this?
>
>
> On Thu, Mar 1, 2012 at 11:12 PM, Bruce Bian <[EMAIL PROTECTED]> wrote:
>>
>> Hi Jarcec,
>> After looking at the code of sqoop, my previous split-by problem is caused
>> by stating '--driver "oracle.jdbc.OracleDriver" ' in the sqoop command, and
>> the following code in org.apache.sqoop.manager.DefaultManagerFactory
>> initialized a GenericJdbcManager instead of OracleManager even after
>> --connection-manager OracleManager is specified
>>
>>     SqoopOptions options = data.getSqoopOptions();
>>     String manualDriver = options.getDriverClassName();
>>     if (manualDriver != null) {
>>       // User has manually specified JDBC implementation with --driver.
>>       // Just use GenericJdbcManager.
>>       return new GenericJdbcManager(manualDriver, options);
>>     }
>> Any reason why the code above appears before initializing the connection
>> manager specified by the user?Shouldn't it be put even after the connection
>> scheme is judged?

This is by design of the current implementation. When you specify an
explicit driver, the builtin connection manager selection defaults to
the generic connection manager. This allows you to use Sqoop with
databases that have compliant JDBC drivers but are not directly
supported by Sqoop.

Is there any particular reason why you must specify the --driver
option? By default the built in Oracle connection manager will chose
the very driver you are trying to pass in.

Thanks,
Arvind
>>
>>
>> On Thu, Mar 1, 2012 at 5:17 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Ignored :-)
>>>
>>> I do not believe that you're hitting exactly SQOOP-204. It seems that
>>> SQOOP-204 has failed during creating Input Splits for your job. But you seem
>>> to be dying after your job is being executed on hadoop cluster.
>>>
>>> I'm afraid that I do not know how to help you at the moment. Would you
>>> mind upgrading on current 1.4.1 version?
>>>
>>> Jarcec
>>>
>>> On Thu, Mar 01, 2012 at 04:59:48PM +0800, Bruce Bian wrote:
>>> > Hi Jarek ,
>>> > Please ignore my first problem for getting no hdfs results as it turns
>>> > out
>>> > to be my silly mistake during copying of the query. sorry for the
>>> > annoyance.
>>> > The second problem of adding --split-by turns out to be SQOOP-204, but
>>> > it
>>> > should already be fixed in 1.3.0 while i'm using 1.3.0-cdh3u3 or is it?
>>> >
>>> > On Thu, Mar 1, 2012 at 4:10 PM, Bruce Bian <[EMAIL PROTECTED]>
>>> > wrote:
>>> >
>>> > > also when I'm adding the --split-by a.prod_inst_id to the sqoop
>>> > > command as
>>> > > in:
>>> > > QUERY="SELECT a.*,
>>> > >
>>> > > b.acnt_no,b.addr_id,b.postcode,b.acnt_rmnd_tp,b.print_tp,b.media_type,
>>> > > c.cust_code,c.root_cust_code,
>>> > >
>>> > >
>>> > > d.mdf_name,d.sub_bureau_code,d.bureau_cd,d.adm_sub_bureau_name,d.bureau_name
>>> > > FROM prc_idap_pi_root a
>>> > >  LEFT OUTER JOIN prc_idap_pi_root_acnt b ON a.acnt_id=b.acnt_id
>>> > >  LEFT OUTER JOIN prc_idap_pi_root_cust c ON a.cust_id=c.cust_id
>>> > >  LEFT OUTER JOIN ocrm_vt_area d ON a.dev_area_id=d.area_id
>>> > > WHERE lst_upd_tmp >= (SELECT date_val - 1/240 FROM
>>> > > etl.etl_para_cfg_detail
>>> > > WHERE para_id=84) AND \$CONDITIONS"
>>> > > sqoop import \
>>> > > --verbose \
>>> > > --driver oracle.jdbc.OracleDriver \
>>> > > --connect jdbc:oracle:thin:@10.239.47.36:1521/dx \
>>> > > --username *** \
>>> > > --password ****** \
>>> > > --query "$QUERY" \
>>> > > --split-by a.prod_inst_id \
>>> > > --target-dir /home/wbian/test
>>> > >
>>> > > A "java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not
>>> > > properly
>>> > > ended"   Exception is threw.
>>> > > And I'm testing this sqoop command on a stand-alone hadoop node.
>>> > > The output when adding the split-by is as follows:
>>> > >
>>> > > 14/03/01 15:24:21 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>> > > 12/03/01 15:24