Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Using Sqoop incremental import as chunk


Copy link to this message
-
RE: Using Sqoop incremental import as chunk
Sure Jarcec,Actually we would like to import data(from oracle) 4-5 times in a day and then process(analytics) them pretty much same number of time. Each of the chunk may have around 10 millions record. The records are continuously added in that table and sometimes for a given time frame, it may cross 10 M. So in that case we will not import all of the records, instead we will import only 10 M records. That's why we are trying to import them as a chunk.
Tanzir

> Date: Wed, 8 May 2013 11:23:25 -0700
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Using Sqoop incremental import as chunk
>
> Hi Tanzir,
> would you mind describing a bit more about your use case? Is there a reason why you do not want your Oozie job to import all missing data?
>
> Jarcec
>
> On Thu, May 09, 2013 at 12:17:03AM +0600, Tanzir Musabbir wrote:
> > Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie coordinator job which periodically imports chunk data through Sqoop, before calling the Sqoop action I need to change the boundary query value every time. Like
> > --boundary-query 'select 1,20' - for the 1st run--boundary-query 'select 21,40' - for the 2nd run
> > Please correct me if I'm wrong. Thanks again.
> >
> > > Date: Wed, 8 May 2013 11:08:05 -0700
> > > From: [EMAIL PROTECTED]
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Using Sqoop incremental import as chunk
> > >
> > > Hi Tanzir,
> > > incremental import is not working in chunks, it always imports everything since last import - e.g. everything from --last-value up. You can simulate the chunks if needed using --boundary-query argument as was advised by Felix.
> > >
> > > Jarcec
> > >
> > > On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > > > --boundary-query
> > > >
> > > > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > > >
> > > > --
> > > > Felix
> > > >
> > > >
> > > > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <[EMAIL PROTECTED]>wrote:
> > > >
> > > > >  Hello everyone,
> > > > >
> > > > > Is it really possible to import chunk-wise data through sqoop incremental
> > > > > import?
> > > > >
> > > > > Say I have a table with id 1,2,3..... N (here N is 100) and now I want to
> > > > > import it as chunk. Like
> > > > > 1st import: 1,2,3.... 20
> > > > > 2nd import: 21,22,23.....40
> > > > > last import: 81,82,83....100
> > > > >
> > > > > I have read about the Sqoop job with incremental import and also know the
> > > > > --last-value parameter but do not know how to pass the chunk size. For the
> > > > > above example, chunk size here is 20.
> > > > >
> > > > >
> > > > > Any information will be highly appreciated. Thanks in advance.
> > > > >
> >