Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> sqoop import - controlling destination file size


Copy link to this message
-
Re: sqoop import - controlling destination file size
Hi Martin,
I'm afraid that Sqoop currently do not have such functionality.

Each mapper will always create it's own output file. Thus if you know the number of rows upfront, you might specify adequate number of mappers using "-m" parameter.

Jarcec

On Wed, Jan 23, 2013 at 04:05:47PM -0600, Corbett Martin wrote:
> Hello,
>
> I'm importing data from apache-derby into HDFS using sqoop and it's working but I have a one question.
>
> Using this version of sqoop
> sqoop-1.4.1+54-1.cdh4.1.2.p0.21.sles11
>
> How can I control the destination file sizes of the import?  For example I want to import data from a table with millions of records and have it create destination files of either A) N rows or B) N bytes.  Preferably N rows.  So for example I want to import a table with 1,000,000 rows can create 10 files in HDFS, each containing 100,000 rows.  Is this possible with sqoop or must I write my own post processing code?
>
> Thanks in advance
> ~Corbett Martin
> Software Architect
> AbsoluteAR Accounts Receivable Services - An NHIN Solution
>
>
> ________________________________
> This message and its contents (to include attachments) are the property of National Health Systems, Inc. and may contain confidential and proprietary information. This email and any files transmitted with it are intended solely for the use of the individual or entity to whom they are addressed. You are hereby notified that any unauthorized disclosure, copying, or distribution of this message, or the taking of any unauthorized action based on information contained herein is strictly prohibited. Unauthorized use of information contained herein may subject you to civil and criminal prosecution and penalties. If you are not the intended recipient, you should delete this message immediately and notify the sender immediately by telephone or by replying to this transmission.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB