Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Reflect MySQL updates into Hive


+
Ibrahim Yakti 2012-12-24, 13:08
+
Dean Wampler 2012-12-24, 14:51
+
Ibrahim Yakti 2012-12-24, 15:34
+
Dean Wampler 2012-12-24, 18:12
+
Ibrahim Yakti 2012-12-24, 18:25
+
Kshiva Kps 2012-12-25, 05:50
+
Mohammad Tariq 2012-12-25, 05:56
+
Mohammad Tariq 2012-12-25, 05:59
+
Ibrahim Yakti 2012-12-26, 06:27
+
Ibrahim Yakti 2012-12-26, 13:54
+
Mohammad Tariq 2012-12-26, 14:52
+
Ibrahim Yakti 2012-12-26, 14:56
+
Mohammad Tariq 2012-12-24, 13:19
+
Ibrahim Yakti 2012-12-24, 13:30
+
Mohammad Tariq 2012-12-24, 13:35
+
Ibrahim Yakti 2012-12-24, 13:38
+
Mohammad Tariq 2012-12-24, 14:03
+
Ibrahim Yakti 2012-12-24, 14:08
+
Mohammad Tariq 2012-12-24, 14:25
+
Ibrahim Yakti 2012-12-24, 14:28
+
Jeremiah Peschka 2012-12-24, 14:22
+
Edward Capriolo 2012-12-24, 14:28
+
Mohammad Tariq 2012-12-24, 14:31
+
Ibrahim Yakti 2012-12-24, 14:29
+
Edward Capriolo 2012-12-24, 14:37
Copy link to this message
-
Re: Reflect MySQL updates into Hive
Ibrahim Yakti 2012-12-24, 14:41
Bottom line: use sqoop to import data into HBase/Cassandra for storage and
use Hive to query the data using external tables, did I miss anything?
--
Ibrahim
On Mon, Dec 24, 2012 at 5:37 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> Hive can not easily handle updates. The most creative way I saw this done
> was someone managed to capture all updates and then use union queries which
> rewrote the same hive table with the newest value.
>
> original + union delta + column with latest timestamp = new original
>
> But that is a lot of processing especially when you may not have man
> updates. Hive has storage handlers that let you lay a table over hbase and
> cassandra data. Store your data in those systems, they take updates, then
> use hive to query those.
>
>
> On Mon, Dec 24, 2012 at 9:29 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>
>> Edward can you explain more please? you suggesting that I should use
>> HBase for such tasks instead of hive?
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>>
>>> You can only do the last_update idea if this is an insert only dataset.
>>>
>>> If your table takes updates you need a different strategy.
>>> 1) full dumps every interval.
>>> 2) Using a storage handler like hbase or cassandra that takes update
>>> operations
>>>
>>>
>>>
>>> On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> If it were me, I would find a way to identify the partitions that have
>>>> modified data and then re-load a subset of the partitions (only the ones
>>>> with changes) on a regular basis. Instead of updating/deleting data, you'll
>>>> be re-loading specific partitions as an all or nothing action.
>>>>
>>>> On Monday, December 24, 2012, Ibrahim Yakti wrote:
>>>>
>>>>> This already done, but Hive does not support update nor deletion of
>>>>> data, so when I import the data after specific "last_update_time" records,
>>>>> hive will append it not replace.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> You can use Apache Oozie to schedule your imports.
>>>>>
>>>>> Alternatively, you can have an additional column in your SQL table,
>>>>> say LastUpdatedTime or something. As soon as there is a change in this
>>>>> column you can start the import from this point. This way you don't have to
>>>>> import all the things everytime there is a change in your table. You just
>>>>> have to move only the most recent data, say only the 'delta' amount of data.
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> My question was how to reflect MySQL updates to hadoop/hive, this is
>>>>> our problem now.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> Cool. Then go ahead :)
>>>>>
>>>>> Just in case you need something in realtime, you can have a look at
>>>>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS
>>>>> with Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>>>> computing, as I said we want to use Hive for analytical queries.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> Hello Ibrahim,
>>>>>
>>>>>      A quick questio. Are you planning to replace your SQL DB with
>>>>> Hive? If that is the case, I would not suggest to do that. Both are meant
>>>>> for entirely different purposes. Hive is for batch processing and not for