Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Data mismatch when importing data from Oracle to Hive through Sqoop without an error


Copy link to this message
-
Re: Data mismatch when importing data from Oracle to Hive through Sqoop without an error
Yes, This is newline char in one of the text fields.

The data is split and forms two rows if the original field has one new line
char.  -hive-drop-import-delims is the helper.

Thanks,
Chalcy

On Wed, Mar 6, 2013 at 12:35 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Ajit,
> I've seen similar issue many times. Does your table have textual data? If
> so, can it happen that your textual data contains hive delimiters like new
> line characters? Because if so then Sqoop might create two lines in for one
> single row in the table that will be consequently seen as two rows in Hive.
> As Hive will implicitly convert any invalid values into NULL, it would also
> explain your NULL values.
>
> Sqoop offers arguments --hive-drop-import-delims and
> --hive-delims-replacement to deal with this problem. More information can
> be found in Sqoop user guide [1].
>
> Jarcec
>
> Links:
> 1:
> http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive
>
> On Wed, Mar 06, 2013 at 10:50:26PM +0530, abhijeet gaikwad wrote:
> > If you see Hadoop job counters in your logs - numbers are correct till
> that
> > point. So normal import in HDFS is working fine. Only reasonable
> > explanation is that there is an issue inserting data in Hive.
> >
> > Have a look at the file on HDFS it should contain the correct number of
> > records with correct data. But if you do "select * ..." from Hive console
> > it will give you NULL values for invalid data, as in the data that cannot
> > be converted to the respective data type of the column of Hive table.
> This
> > is because Hive imposes data type constraints on the data in files on
> HDFS.
> > Those extra records is a mystry for me too. I don't think actual file on
> > HDFS has extra records.
> >
> > Sqoop is creating Hive table incorrectly which **may** have caused this
> > issue. Create table manually with correct data types if equivalent
> > available or as string and try loading data. If this works fine we know
> > where the issue is!
> >
> > Thanks,
> > Abhijeet
> > On Wed, Mar 6, 2013 at 2:05 PM, Ajit Kumar Shreevastava <
> > [EMAIL PROTECTED]> wrote:
> >
> >
> > >  Hi Abhijeet,****
> > >
> > > ** **
> > >
> > > Thanks for your response.****
> > >
> > > If values that don’t fit in double must be getting inserted as Null is
> the
> > > case then count should not be mis-match in both the case.****
> > >
> > > Here the null value inserted are extra value apart from the other value
> > > which is already present in both Oracle Table and Hive table.****
> > >
> > > ** **
> > >
> > > Correct me if I am wrong in interpretation.****
> > >
> > > ** **
> > >
> > > Thanks and Regards,****
> > >
> > > Ajit Kumar Shreevastava****
> > >
> > > ** **
> > >
> > > *From:* abhijeet gaikwad [mailto:[EMAIL PROTECTED]]
> > > *Sent:* Wednesday, March 06, 2013 1:46 PM
> > > *To:* [EMAIL PROTECTED]
> > > *Cc:* [EMAIL PROTECTED]
> > > *Subject:* Re: Data mismatch when importing data from Oracle to Hive
> > > through Sqoop without an error****
> > >
> > >   ** **
> > >
> > > Sqoop maps numeric and decimal types (RDBMS) to double (Hive). I think
> the
> > > values that don't fit in double must be getting inserted as NULL.
> > > You can see this warning in your logs.
> > >
> > > Thanks,
> > > Abhijeet****
> > >
> > > On Wed, Mar 6, 2013 at 1:32 PM, Ajit Kumar Shreevastava <
> > > [EMAIL PROTECTED]> wrote:****
> > >
> > > Hi all,****
> > >
> > > I have notice one interesting thing in the below result-set.****
> > >
> > > I have fired one query in both Oracle and Hive shell and found the
> > > following result set:à****
> > >
> > >  ****
> > >
> > > *SQL> select count(1) from bttn*****
> > >
> > > *  2  where bttn_id is null or data_inst_id is null or scr_id is
> null;****
> > > *
> > >
> > > * *****
> > >
> > > *  COUNT(1)*****
> > >
> > > *----------*****
> > >
> > > *         0*****
> > >
> > > *hive> select count(1) from bttn*****
> > >
> > > *    > where bttn_id is null or data_inst_id is null or scr_id is
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB