Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Reflect MySQL updates into Hive


Copy link to this message
-
Re: Reflect MySQL updates into Hive
Good points by Edward. I specially love the point no. 2.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Mon, Dec 24, 2012 at 7:58 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> You can only do the last_update idea if this is an insert only dataset.
>
> If your table takes updates you need a different strategy.
> 1) full dumps every interval.
> 2) Using a storage handler like hbase or cassandra that takes update
> operations
>
>
>
> On Mon, Dec 24, 2012 at 9:22 AM, Jeremiah Peschka <
> [EMAIL PROTECTED]> wrote:
>
>> If it were me, I would find a way to identify the partitions that have
>> modified data and then re-load a subset of the partitions (only the ones
>> with changes) on a regular basis. Instead of updating/deleting data, you'll
>> be re-loading specific partitions as an all or nothing action.
>>
>> On Monday, December 24, 2012, Ibrahim Yakti wrote:
>>
>>> This already done, but Hive does not support update nor deletion of
>>> data, so when I import the data after specific "last_update_time" records,
>>> hive will append it not replace.
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>
>>> You can use Apache Oozie to schedule your imports.
>>>
>>> Alternatively, you can have an additional column in your SQL table, say
>>> LastUpdatedTime or something. As soon as there is a change in this column
>>> you can start the import from this point. This way you don't have to import
>>> all the things everytime there is a change in your table. You just have to
>>> move only the most recent data, say only the 'delta' amount of data.
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>>
>>> My question was how to reflect MySQL updates to hadoop/hive, this is our
>>> problem now.
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>
>>> Cool. Then go ahead :)
>>>
>>> Just in case you need something in realtime, you can have a look at
>>> Impala.(I know nobody likes to get preached, but just in case ;) ).
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>>
>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with
>>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing
>>> computing, as I said we want to use Hive for analytical queries.
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>
>>> Hello Ibrahim,
>>>
>>>      A quick questio. Are you planning to replace your SQL DB with Hive?
>>> If that is the case, I would not suggest to do that. Both are meant for
>>> entirely different purposes. Hive is for batch processing and not for real
>>> time system. So if you are requirements involve real time things, you need
>>> to think before moving ahead.
>>>
>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose.
>>>
>>> HTH
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi All,
>>>
>>> We are new to hadoop and hive, we are trying to use hive to
>>> run analytical queries and we are using sqoop to import data into hive, in
>>> our RDBMS the data updated very frequently and this needs to be reflected
>>> to hive. Hive does not support update/delete but there are many workarounds
>>> to do this task.
>>>
>>> What's in our mind is importing all the
>>>
>>>
>>
>> --
>> ---
>> Jeremiah Peschka
>> Founder, Brent Ozar Unlimited
>> Microsoft SQL Server MVP
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB