Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> RE: hive sqoop question - could not post- the message came back undeplivered


Copy link to this message
-
RE: hive sqoop question - could not post- the message came back undeplivered
I tried creating the hive table like this,

Create table my table (..........)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  LINES TERMINATED BY '\n' STORED AS TEXTFILE;

And then I would add about 95 rows into this hive table and use the sqoop below to export.  It works, so we know data is intact.  Modify the number of rows in the hive table to be 100 and run the same export - fails with the error below.

sqoop export --connect 'jdbc:sqlserver://mymserverconnetion;username=myuserid;password=mypwd;database=mydb' --table mysqltable -m 1 --input-fields-terminated-by '\t' --input-lines-terminated-by '\n'  --export-dir /warehouse/stg_chalcy/

Here this is what happens, like you say, the entire out put is treated as one record.  With 95 rows, if I check the database table, I do get 95 rows.

Thanks,
Chalcy

From: Nitin Pawar [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 30, 2012 1:26 AM
To: [EMAIL PROTECTED]
Subject: Re: sqoop question - could not post- the message came back undeplivered

Hi Chalcy,

You may want to try setting up the column separator, record separator and line separator in your code.

Sometimes this also causes that the entire output is treated as a single record.
I am not much aware of sqoop but I did face these issues while migrating data from hive to mysql

Thanks,
Nitin

On Thu, Mar 29, 2012 at 10:08 PM, Chalcy Raja <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi Nitin,

I have a hive table. I am trying to sqoop export the data from the hive hdfs location.  I am not using any array.  And therefore I do not understand why I hit the parameter limit.  It looks like sqoop exporting kind of merge all the rows. (rows * number of columns > 2100).

Thank you,
Chalcy

From: Nitin Pawar [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Thursday, March 29, 2012 10:03 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: sqoop question - could not post- the message came back undeplivered

Chalcy,

from sqoop side there is no issue with number of records.

this is the limitation with microsoft sql server. If you have arrays in your records etc, it might fail.

Others facing the same issue have resolved using this thread on msdn

http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/145d6dfa-e523-4046-bafa-1ff0f6557634
On Thu, Mar 29, 2012 at 7:16 PM, Chalcy Raja <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
I am trying to do a sqoop export (data from hdfs hadoop to database). The table I am trying to export has 2 million rows.  The table has 20 fields. The sqoop command is successful if I did 10 rows till 95 rows.  When I try anything more than 95, the sqoop export fails with the following error.

By googling I get that this a dbms limitation. Is there anyway to configure to fix this error?  I am surprised that it works for few rows.

Any help is appreciated.

Thanks,
CHalcy

12/03/29 09:00:59 INFO mapred.JobClient: Task Id : attempt_201203230811_0539_m_000000_0, Status : FAILED
java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
       at com.cloudera.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:189)
       at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:540)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
       at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:396)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
       at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect.
12/03/29 09:01:05 INFO mapred.JobClient: Task Id : attempt_201203230811_0539_m_000000_1, Status : FAILED
java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
       at com.cloudera.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:189)

From: Thiruvel Thirumoolan [mailto:[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>]
Sent: Thursday, March 29, 2012 7:55 AM
To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>; [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>
Subject: Re: Executing query and storing output on HDFS

This should help.

https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-Wri
tingdataintofilesystemfromqueries
On 3/29/12 4:48 PM, "Paul Ingles" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Nitin Pawar

Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB