Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> sqoop exporting from hdfs to postgresql, using --staging-table option


+
Weihua Zhu 2012-01-23, 23:10
+
Kathleen Ting 2012-01-24, 01:26
+
Weihua Zhu 2012-01-25, 01:13
+
Srinivas Surasani 2012-01-25, 23:36
+
Kathleen Ting 2012-01-26, 01:01
+
Srinivas Surasani 2012-01-26, 01:39
+
Srinivas Surasani 2012-01-26, 01:43
+
abhijeet gaikwad 2012-01-28, 15:12
+
Srinivas Surasani 2012-01-28, 16:46
+
abhijeet gaikwad 2012-01-28, 17:47
+
Srinivas Surasani 2012-01-28, 17:58
Copy link to this message
-
Re: sqoop exporting from hdfs to postgresql, using --staging-table option
As to what I understanf:
It is loading 1000 records because there are 1000 records in your input
file. We cannot relate this to batch size because firing a select count(*)
query will give you the total records inserted in the table - not the
batches in which they were inserted.

Cannot comment on the last statement (#3) in your mail below - specially in
context with Teradata. Open for discussion :)

Thanks,
Abhijeet Gaikwad

On Sat, Jan 28, 2012 at 11:28 PM, Srinivas Surasani <[EMAIL PROTECTED]>wrote:

>
> Abhijeet --
>
> 1) By default it is loading 1000 records and as you mentioned earlier this
> can be tweaked using  "sqoop.export.records.per.statement ".
> 2) I  just run select count(*) on teradata table and seen 1000 records
> inserted at one go.
> 3) I believe  setting number of mappers > 1 works for only non-parallel
> databases.
>
> -- Srinivas
>
>
> On Sat, Jan 28, 2012 at 12:47 PM, abhijeet gaikwad <[EMAIL PROTECTED]
> > wrote:
>
>> Hi Srinivas,
>> Export with multiple mappers is allowed in SQOOP. I have exported data
>> into Sql Server as well as MySql using multiple mappers.
>>
>> Regarding the issue you are facing, I have few questions:
>> 1. How did you set the batch size, 1000 you are talking about.
>> 2. Can you share SQOOP logs in detail?
>> 3. Deadlock issue seems to have raised by Teradata. There is an equal
>> probability that a Teradata admin will be able to resolve this issue.
>>
>> Thanks,
>> Abhijeet Gaikwad
>> On 28 Jan 2012 22:16, "Srinivas Surasani" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Abhijeet --,
>>>
>>>  Thanks for the information. I have one more question. Is the exports is
>>> done always with one mapper? ( entering into table deadlocks if number of
>>> mappers set to more than one ).
>>> Also, FYI: I have observed the default number of rows inserted is 1000.
>>>
>>> Thanks,
>>> -- Srinvas
>>>
>>> On Sat, Jan 28, 2012 at 10:12 AM, abhijeet gaikwad <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Hi Srinivas,
>>>> Haven't played with Teradata Connector, but in general there are two
>>>> properties that drive insertions (SQOOP Export) in a table - namely:
>>>>
>>>> 1. "sqoop.export.records.per.statement" : This property is used to
>>>> specify the number of records/rows to be inserted using a single
>>>> INSERT statement. Default value is 100.
>>>> 2. "sqoop.export.statements.per.transaction" : This property is used
>>>> to specify the number the insert statements before a commit is fired -
>>>> which you can call batch size. Default value is 100.
>>>>
>>>> You can use -D hadoop argument to specify these properties at command
>>>> line. E.g. -Dsqoop.export.statements.per.transaction=50
>>>>
>>>> NOTE: Make sure you use this argument(-D) before using any of the
>>>> SQOOP tool specific arguments. See SQOOP User Guide for more details.
>>>>
>>>> Thanks,
>>>> Abhijeet Gaikwad
>>>>
>>>> On 1/26/12, Srinivas Surasani <[EMAIL PROTECTED]> wrote:
>>>> > Kathleen,
>>>> >
>>>> > Any information on below request.
>>>> >
>>>> > Hi All,
>>>> >
>>>> > I'm working on Hadoop CDH3 U0 and  Sqoop CDH3 U2.
>>>> >
>>>> > I'm trying to export csv files from HDFS to Teradata, it works well
>>>> with
>>>> > setting mapper to "1" ( with batch loading of 1000 records at a time
>>>> ).
>>>> > when I tried increasing the number of mappers to more than one I'm
>>>> getting
>>>> > the following error. Also, is it possible to configure batch size at
>>>> the
>>>> > time of export ( from the command line)??
>>>> >
>>>> >
>>>> >  sqoop export  --verbose --driver com.teradata.jdbc.TeraDriver
>>>> > --connect jdbc:teradata://xxxx/database=xxxx  --username xxxxx
>>>> --password
>>>> > xxxxx --table xxxx --export-dir /user/surasani/10minutes.txt
>>>> > --fields-terminated-by '|' -m 4 --batch
>>>> >
>>>> > 12/01/24 16:17:21 INFO mapred.JobClient:  map 3% reduce 0%
>>>> > 12/01/24 16:17:48 INFO mapred.JobClient: Task Id :
>>>> > attempt_201112211106_68553_m_000001_2, Status : FAILED
>>>> > *java.io.IOException: java.sql.BatchUpdateException: [Teradata