Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Using Sqoop incremental import as chunk


Copy link to this message
-
Re: Using Sqoop incremental import as chunk
That's the only way I see you being able to achieve this, yes.

(Assuming you want many separate sequential imports, because if importing
the chunks in parallel is fine with you then you could use a single sqoop
command and let the size of your chunks be a by-product of the number of
mappers you choose.)

--
Felix
On Wed, May 8, 2013 at 2:17 PM, Tanzir Musabbir <[EMAIL PROTECTED]>wrote:

> Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie
> coordinator job which periodically imports chunk data through Sqoop, before
> calling the Sqoop action I need to change the boundary query value every
> time. Like
>
> --boundary-query 'select 1,20' - for the 1st run
> --boundary-query 'select 21,40' - for the 2nd run
>
> Please correct me if I'm wrong. Thanks again.
>
>
> > Date: Wed, 8 May 2013 11:08:05 -0700
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]
> > Subject: Re: Using Sqoop incremental import as chunk
>
> >
> > Hi Tanzir,
> > incremental import is not working in chunks, it always imports
> everything since last import - e.g. everything from --last-value up. You
> can simulate the chunks if needed using --boundary-query argument as was
> advised by Felix.
> >
> > Jarcec
> >
> > On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > > --boundary-query
> > >
> > >
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > >
> > > --
> > > Felix
> > >
> > >
> > > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <[EMAIL PROTECTED]
> >wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > Is it really possible to import chunk-wise data through sqoop
> incremental
> > > > import?
> > > >
> > > > Say I have a table with id 1,2,3..... N (here N is 100) and now I
> want to
> > > > import it as chunk. Like
> > > > 1st import: 1,2,3.... 20
> > > > 2nd import: 21,22,23.....40
> > > > last import: 81,82,83....100
> > > >
> > > > I have read about the Sqoop job with incremental import and also
> know the
> > > > --last-value parameter but do not know how to pass the chunk size.
> For the
> > > > above example, chunk size here is 20.
> > > >
> > > >
> > > > Any information will be highly appreciated. Thanks in advance.
> > > >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB