Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Reflect MySQL updates into Hive


+
Ibrahim Yakti 2012-12-24, 13:08
+
Dean Wampler 2012-12-24, 14:51
+
Ibrahim Yakti 2012-12-24, 15:34
+
Dean Wampler 2012-12-24, 18:12
+
Ibrahim Yakti 2012-12-24, 18:25
+
Kshiva Kps 2012-12-25, 05:50
+
Mohammad Tariq 2012-12-25, 05:56
+
Mohammad Tariq 2012-12-25, 05:59
+
Ibrahim Yakti 2012-12-26, 06:27
+
Ibrahim Yakti 2012-12-26, 13:54
+
Mohammad Tariq 2012-12-26, 14:52
+
Ibrahim Yakti 2012-12-26, 14:56
+
Mohammad Tariq 2012-12-24, 13:19
+
Ibrahim Yakti 2012-12-24, 13:30
+
Mohammad Tariq 2012-12-24, 13:35
+
Ibrahim Yakti 2012-12-24, 13:38
+
Mohammad Tariq 2012-12-24, 14:03
+
Ibrahim Yakti 2012-12-24, 14:08
+
Mohammad Tariq 2012-12-24, 14:25
Copy link to this message
-
Re: Reflect MySQL updates into Hive
Ibrahim Yakti 2012-12-24, 14:28
What if you have many columns that need to be updated? a simple example:
confirmation date, payment status(es) + status update time, delivery, ...
etc ?  on what base you will set your partition and how the old data will
be removed because the updated data will be reloaded in other partition if
I partition using payment status for example.
--
Ibrahim
On Mon, Dec 24, 2012 at 5:25 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> I was actually trying to answer you actual questions. What are you
> currently doing to tackle this update problem and what kind of tweak you
> are looking for?There is no direct solution to achieve this,
> out-of-the-box, as you have said.
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Mon, Dec 24, 2012 at 7:38 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>
>> This already done, but Hive does not support update nor deletion of data,
>> so when I import the data after specific "last_update_time" records, hive
>> will append it not replace.
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> You can use Apache Oozie to schedule your imports.
>>>
>>> Alternatively, you can have an additional column in your SQL table, say
>>> LastUpdatedTime or something. As soon as there is a change in this column
>>> you can start the import from this point. This way you don't have to import
>>> all the things everytime there is a change in your table. You just have to
>>> move only the most recent data, say only the 'delta' amount of data.
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>>
>>>> My question was how to reflect MySQL updates to hadoop/hive, this is
>>>> our problem now.
>>>>
>>>>
>>>> --
>>>> Ibrahim
>>>>
>>>>
>>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Cool. Then go ahead :)
>>>>>
>>>>> Just in case you need something in realtime, you can have a look at
>>>>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS
>>>>>> with Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>>>>> computing, as I said we want to use Hive for analytical queries.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ibrahim
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>>
>>>>>>> Hello Ibrahim,
>>>>>>>
>>>>>>>      A quick questio. Are you planning to replace your SQL DB with
>>>>>>> Hive? If that is the case, I would not suggest to do that. Both are meant
>>>>>>> for entirely different purposes. Hive is for batch processing and not for
>>>>>>> real time system. So if you are requirements involve real time things, you
>>>>>>> need to think before moving ahead.
>>>>>>>
>>>>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Tariq
>>>>>>> +91-9741563634
>>>>>>> https://mtariq.jux.com/
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>>>>> run analytical queries and we are using sqoop to import data into hive, in
>>>>>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>>>>>> to hive. Hive does not support update/delete but there are many workarounds
>>>>>>>> to do this task.
>>>>>>>>
>>>>>>>> What's in our mind is importing all the tables into hive as is,
>>>>>>>> then we build the required tables for reporting.
>>>>>>>>
>>>>>>>> My questions are:
>>>>>>>>
>>>>>>>>    1. What is the best way to reflect MySQL updates into Hive with
+
Jeremiah Peschka 2012-12-24, 14:22
+
Edward Capriolo 2012-12-24, 14:28
+
Mohammad Tariq 2012-12-24, 14:31
+
Ibrahim Yakti 2012-12-24, 14:29
+
Edward Capriolo 2012-12-24, 14:37
+
Ibrahim Yakti 2012-12-24, 14:41