Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> Sqoop incremental import ( can any just help me out)


Copy link to this message
-
Re: Sqoop incremental import ( can any just help me out)
Thanks a lot Devin,

Yes my column has increasing values, lets say date column for daily pull,
as date keeps on changing same kind of another column which converts every
date into juliene format which is always changing.
I meant that for which I have done split by it keeps changing
and on what i am planning to do split by its also keep changing..

so will it b safe to change the split by to replace older
column(changing values) to new column(changing values at different rate)..
Pls suggest

Thanks
yogesh

On Tue, Dec 31, 2013 at 1:27 AM, Devin Suiter RDX <[EMAIL PROTECTED]> wrote:

> If it's kind of a risk, and you can't take any chances...Why are you
> testing in that environment?
>
> Why not set up a VM with a test database, and a VM with a pseudo-cluster,
> and load a subset of your data, and experiment in a development environment
> so that you can know for sure - even if someone guarantees you the answer
> on here, you can not be certain everything is identical across all the
> versions of Sqoop, Hadoop, etc for them as it would be for you...if the
> data you are working with has value, you should find a safe way to
> experiment rather than trust your valuable data to the mailing list answers.
>
> Now, in answer to your question:
>
> According to my peer (I am not the Sqoop person where I work) if your
> incremental split is on a column that has increasing values, you can safely
> split on that, but if the value you split on is always the same, it is a
> bad choice for incremental splitting - he uses a datetime column I believe,
> and then the import is from the last imported datetime value up to the
> current max. I am not sure if that helps your case, but it is my hope that
> you find it useful.
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Mon, Dec 30, 2013 at 2:27 PM, yogesh kumar <[EMAIL PROTECTED]>wrote:
>
>> Thanks Chalcy, I got your point, let me try a simple test for it..   but
>> the situation here is for incremental import i have to change the column
>> for split by
>>
>> Its a kind of risk..   can not take a chance.  just want to be sure that.
>>
>> it will not affect the hive table and data into it after
>> being incremental import. my incremental  import will directly pull data
>> and put it at where my old sqooped data resides
>>
>> Want suggestion from champions of sqoop
>> Pls hep me out
>>
>>
>>
>>
>>
>> On Tue, Dec 31, 2013 at 12:30 AM, Chalcy <[EMAIL PROTECTED]> wrote:
>>
>>> I have not tried this but I believe you can change the split by as you
>>> wish.  The split by is used to split the jobs while --check-column and
>>> --last-value are used for incremental import.
>>>
>>> I do not know exact scenario but if empno gives a better split, you
>>> still can use that for incremental import instead of changing the split-by
>>> field.
>>>
>>> I would suggest you do a very simple test to find out.
>>>
>>> Hope this helps,
>>> Chalcy
>>>
>>>
>>> On Mon, Dec 30, 2013 at 1:18 PM, yogesh kumar <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hello all,
>>>>
>>>> I have done sqoop import for a particluar table first time say table
>>>> Employee..
>>>>
>>>> sqoop import -libjars .....
>>>> --query "select empno, name, date, loc from table Employee where
>>>> \$CONDITIONS ..  "
>>>> *--split-by empno*
>>>> --fields-terminated-by ','
>>>> .
>>>> .
>>>> .
>>>> .
>>>>
>>>> I have created an external table on hive,
>>>>
>>>> *Now I want to pull data on daily basis by using incremental pull.  can
>>>> I specify the different column for --split-by*
>>>>
>>>> like
>>>>
>>>> sqoop import -libjars .....
>>>> --query "select empno, name, date, loc from table Employee where
>>>> \$CONDITIONS ..  "
>>>> --check-column date
>>>> --incremental append
>>>> --last-value 2013-05-01
>>>> *--split-by date*
>>>> --split-by empno
>>>>
>>>>
>>>> Can I change the column for *split by in incremental sqoop*, if not
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB