Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Reflect MySQL updates into Hive


Copy link to this message
-
Re: Reflect MySQL updates into Hive
Bottom line: use sqoop to import data into HBase/Cassandra for storage and
use Hive to query the data using external tables, did I miss anything?
--
Ibrahim
On Mon, Dec 24, 2012 at 5:37 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> Hive can not easily handle updates. The most creative way I saw this done
> was someone managed to capture all updates and then use union queries which
> rewrote the same hive table with the newest value.
>
> original + union delta + column with latest timestamp = new original
>
> But that is a lot of processing especially when you may not have man
> updates. Hive has storage handlers that let you lay a table over hbase and
> cassandra data. Store your data in those systems, they take updates, then
> use hive to query those.
>
>
> On Mon, Dec 24, 2012 at 9:29 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>
>> Edward can you explain more please? you suggesting that I should use
>> HBase for such tasks instead of hive?
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Mon, Dec 24, 2012 at 5:28 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:
>>
>>> You can only do the last_update idea if this is an insert only dataset.
>>>
>>> If your table takes updates you need a different strategy.
>>> 1) full dumps every interval.
>>> 2) Using a storage handler like hbase or cassandra that takes update
>>> operations
>>>
>>>
>>>
>>> On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> If it were me, I would find a way to identify the partitions that have
>>>> modified data and then re-load a subset of the partitions (only the ones
>>>> with changes) on a regular basis. Instead of updating/deleting data, you'll
>>>> be re-loading specific partitions as an all or nothing action.
>>>>
>>>> On Monday, December 24, 2012, Ibrahim Yakti wrote:
>>>>
>>>>> This already done, but Hive does not support update nor deletion of
>>>>> data, so when I import the data after specific "last_update_time" records,
>>>>> hive will append it not replace.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> You can use Apache Oozie to schedule your imports.
>>>>>
>>>>> Alternatively, you can have an additional column in your SQL table,
>>>>> say LastUpdatedTime or something. As soon as there is a change in this
>>>>> column you can start the import from this point. This way you don't have to
>>>>> import all the things everytime there is a change in your table. You just
>>>>> have to move only the most recent data, say only the 'delta' amount of data.
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> My question was how to reflect MySQL updates to hadoop/hive, this is
>>>>> our problem now.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> Cool. Then go ahead :)
>>>>>
>>>>> Just in case you need something in realtime, you can have a look at
>>>>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS
>>>>> with Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>>>> computing, as I said we want to use Hive for analytical queries.
>>>>>
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>>
>>>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>> Hello Ibrahim,
>>>>>
>>>>>      A quick questio. Are you planning to replace your SQL DB with
>>>>> Hive? If that is the case, I would not suggest to do that. Both are meant
>>>>> for entirely different purposes. Hive is for batch processing and not for
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB