Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> SQOOP INCREMENTAL PULL ISSUE (PLEASE SUGGEST.)


Copy link to this message
-
Re: SQOOP INCREMENTAL PULL ISSUE (PLEASE SUGGEST.)
Is the data changing during the time you are running the sqoop import? The
--verbose option should put some interesting debug information in your logs
(such as the query being used at execution) and the different WHERE
parameters being used for splitting.

-Abe
On Mon, Jan 13, 2014 at 12:02 PM, yogesh kumar <[EMAIL PROTECTED]>wrote:

> here was the code I have used ...
>
>
> sqoop import -libjars
>  --driver com.sybase.jdbc3.jdbc.SybDriver \
>  --query "select * from
>  from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \
>
> --check-column Unique_value \
>  --incremental append \
>  --last-value 201401200 \
>  --split-by DEPT \
>  --fields-terminated-by ',' \
>  --target-dir ${TARGET_DIR}/${INC} \
>  --username ${SYBASE_USERNAME} \
>  --password ${SYBASE_PASSWORD} \
>
>
> now I have imported newly inserted data into RDBMS to HDFS
>
> but when I do
>
> select count(*) , unique_value from EMP group by unique_value (both in
> RDBMS and in HIVE)
>
> I can find huge data loss.
>
> 1) in RDBMS
>
>   Count(*)    Unique_value
>   1000          201401201
>    5000         201401202
>   10000         201401203
>
>
> 2) in HIVE
>
>   Count(*)    Unique_value
>   189          201401201
>    421         201401202
>    50           201401203
>
>
> If I do
>
> select Unique value from emp ;
>
> Result :
> 201401201
> 201401201
> 201401201
> 201401201
> 201401201
> .
> .
> 201401202
> .
> .
> and so on...
>
>
> On Tue, Jan 14, 2014 at 1:29 AM, yogesh kumar <[EMAIL PROTECTED]>wrote:
>
>> Hello Abe,
>>
>> Thanks a zillion for your response,
>>
>> Yes the unique_vale is SAL  over here ..
>>
>> I have an option to test with different directory as its having small
>> amount of data (only 1 month data)
>>
>> Yes the interesting fact is I did sqoop pull on the basis on month, into
>> which I am have not done incremental import, and the data matches very
>> well, but for daily pull I have to do incremental import every day..
>>
>> and then i find some data loss...
>>
>> one of the sqoop guy has suggested me to use  " \$CONDITIONS " with in
>> this query as  ......  from EMP where \$CONDITIONS and SAL > 201401200
>> and SAL <= 201401204 \
>>
>> like
>>
>> EMP where SAL > 201401200 and SAL <= 201401204 and   \$CONDITIONS
>>
>>
>>
>> Plz do suggest me pls help me out as I have to output to my client..
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jan 13, 2014 at 11:13 PM, Abraham Elmahrek <[EMAIL PROTECTED]>wrote:
>>
>>> Yogesh,
>>>
>>> Is unique_value in this case SAL? I'm a bit confused about your query.
>>>
>>> Do you have the option of running this query on a separate database
>>> somewhere to find the issue? I think it would be interesting to see the
>>> initial state and then the state after running an incremental import. That
>>> would tell us how many results are being imported after sqoop has ran and
>>> we can validate each step. Also, please use the --verbose flag to get the
>>> most out of the logs.
>>>
>>> -Abe
>>>
>>>
>>> On Mon, Jan 13, 2014 at 5:15 AM, Sharath Punreddy <[EMAIL PROTECTED]>wrote:
>>>
>>>> Yogesh,
>>>>
>>>> Please try to put $CONDITIONS after your where clause.
>>>>
>>>> Checkout the examples in the below blog.
>>>>
>>>>
>>>> http://jugnu-life.blogspot.com/2012/03/sqoop-free-form-query-example.html?m=1
>>>>  On Jan 13, 2014 7:04 AM, "yogesh kumar" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hello Jarcec,
>>>>>
>>>>> I got the issue hope this is the cause..  I got data loss by doing
>>>>> incremental pull
>>>>>
>>>>> I have crossed checked it and found that
>>>>>
>>>>> sqoop import -libjars
>>>>>  --driver com.sybase.jdbc3.jdbc.SybDriver \
>>>>>  --query "select * from
>>>>>  from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \
>>>>> --check-column Unique_value \
>>>>>  --incremental append \
>>>>>  --last-value 201401200 \
>>>>>  --split-by DEPT \
>>>>>  --fields-terminated-by ',' \
>>>>>  --target-dir ${TARGET_DIR}/${INC} \
>>>>>  --username ${SYBASE_USERNAME} \
>>>>>  --password ${SYBASE_PASSWORD} \