Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> sqoop exporting from hdfs to postgresql, using --staging-table option


+
Weihua Zhu 2012-01-23, 23:10
+
Kathleen Ting 2012-01-24, 01:26
+
Weihua Zhu 2012-01-25, 01:13
+
Srinivas Surasani 2012-01-25, 23:36
+
Kathleen Ting 2012-01-26, 01:01
+
Srinivas Surasani 2012-01-26, 01:39
+
Srinivas Surasani 2012-01-26, 01:43
+
abhijeet gaikwad 2012-01-28, 15:12
+
Srinivas Surasani 2012-01-28, 16:46
+
abhijeet gaikwad 2012-01-28, 17:47
+
Srinivas Surasani 2012-01-28, 17:58
Copy link to this message
-
Re: sqoop exporting from hdfs to postgresql, using --staging-table option
As to what I understanf:
It is loading 1000 records because there are 1000 records in your input
file. We cannot relate this to batch size because firing a select count(*)
query will give you the total records inserted in the table - not the
batches in which they were inserted.

Cannot comment on the last statement (#3) in your mail below - specially in
context with Teradata. Open for discussion :)

Thanks,
Abhijeet Gaikwad

On Sat, Jan 28, 2012 at 11:28 PM, Srinivas Surasani <[EMAIL PROTECTED]>wrote:

>
> Abhijeet --
>
> 1) By default it is loading 1000 records and as you mentioned earlier this
> can be tweaked using  "sqoop.export.records.per.statement ".
> 2) I  just run select count(*) on teradata table and seen 1000 records
> inserted at one go.
> 3) I believe  setting number of mappers > 1 works for only non-parallel
> databases.
>
> -- Srinivas
>
>
> On Sat, Jan 28, 2012 at 12:47 PM, abhijeet gaikwad <[EMAIL PROTECTED]
> > wrote:
>
>> Hi Srinivas,
>> Export with multiple mappers is allowed in SQOOP. I have exported data
>> into Sql Server as well as MySql using multiple mappers.
>>
>> Regarding the issue you are facing, I have few questions:
>> 1. How did you set the batch size, 1000 you are talking about.
>> 2. Can you share SQOOP logs in detail?
>> 3. Deadlock issue seems to have raised by Teradata. There is an equal
>> probability that a Teradata admin will be able to resolve this issue.
>>
>> Thanks,
>> Abhijeet Gaikwad
>> On 28 Jan 2012 22:16, "Srinivas Surasani" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Abhijeet --,
>>>
>>>  Thanks for the information. I have one more question. Is the exports is
>>> done always with one mapper? ( entering into table deadlocks if number of
>>> mappers set to more than one ).
>>> Also, FYI: I have observed the default number of rows inserted is 1000.
>>>
>>> Thanks,
>>> -- Srinvas
>>>
>>> On Sat, Jan 28, 2012 at 10:12 AM, abhijeet gaikwad <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> Hi Srinivas,
>>>> Haven't played with Teradata Connector, but in general there are two
>>>> properties that drive insertions (SQOOP Export) in a table - namely:
>>>>
>>>> 1. "sqoop.export.records.per.statement" : This property is used to
>>>> specify the number of records/rows to be inserted using a single
>>>> INSERT statement. Default value is 100.
>>>> 2. "sqoop.export.statements.per.transaction" : This property is used
>>>> to specify the number the insert statements before a commit is fired -
>>>> which you can call batch size. Default value is 100.
>>>>
>>>> You can use -D hadoop argument to specify these properties at command
>>>> line. E.g. -Dsqoop.export.statements.per.transaction=50
>>>>
>>>> NOTE: Make sure you use this argument(-D) before using any of the
>>>> SQOOP tool specific arguments. See SQOOP User Guide for more details.
>>>>
>>>> Thanks,
>>>> Abhijeet Gaikwad
>>>>
>>>> On 1/26/12, Srinivas Surasani <[EMAIL PROTECTED]> wrote:
>>>> > Kathleen,
>>>> >
>>>> > Any information on below request.
>>>> >
>>>> > Hi All,
>>>> >
>>>> > I'm working on Hadoop CDH3 U0 and  Sqoop CDH3 U2.
>>>> >
>>>> > I'm trying to export csv files from HDFS to Teradata, it works well
>>>> with
>>>> > setting mapper to "1" ( with batch loading of 1000 records at a time
>>>> ).
>>>> > when I tried increasing the number of mappers to more than one I'm
>>>> getting
>>>> > the following error. Also, is it possible to configure batch size at
>>>> the
>>>> > time of export ( from the command line)??
>>>> >
>>>> >
>>>> >  sqoop export  --verbose --driver com.teradata.jdbc.TeraDriver
>>>> > --connect jdbc:teradata://xxxx/database=xxxx  --username xxxxx
>>>> --password
>>>> > xxxxx --table xxxx --export-dir /user/surasani/10minutes.txt
>>>> > --fields-terminated-by '|' -m 4 --batch
>>>> >
>>>> > 12/01/24 16:17:21 INFO mapred.JobClient:  map 3% reduce 0%
>>>> > 12/01/24 16:17:48 INFO mapred.JobClient: Task Id :
>>>> > attempt_201112211106_68553_m_000001_2, Status : FAILED
>>>> > *java.io.IOException: java.sql.BatchUpdateException: [Teradata
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB