Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Reflect MySQL updates into Hive


+
Ibrahim Yakti 2012-12-24, 13:08
+
Dean Wampler 2012-12-24, 14:51
+
Ibrahim Yakti 2012-12-24, 15:34
+
Dean Wampler 2012-12-24, 18:12
+
Ibrahim Yakti 2012-12-24, 18:25
+
Kshiva Kps 2012-12-25, 05:50
+
Mohammad Tariq 2012-12-25, 05:56
+
Mohammad Tariq 2012-12-25, 05:59
+
Ibrahim Yakti 2012-12-26, 06:27
Copy link to this message
-
Re: Reflect MySQL updates into Hive
After more reading, a suggested scenario looks like:

MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
Queries ---> Update HBase as needed & Reload in Hive.

What do you think please?

--
Ibrahim
On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:

> Mohammad, I am not sure if the answers & the link were to me or to
> Kshiva's question.
>
> if I have partitioned my data based on status for example, when I run the
> update query it will add the updated data on a new partition (success or
> shipped for example) and it will keep the old data (confirmed or paid for
> example), right?
>
>
> --
> Ibrahim
>
>
> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> Also, have a look at this :
>> http://www.catb.org/~esr/faqs/smart-questions.html
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> Have a look at Beeswax.
>>>
>>> BTW, do you have access to Google at your station?Same question on the
>>> Pig mailing list as well, that too twice.
>>>
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[EMAIL PROTECTED]>wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there any Hive editors and where we can write 100 to 150 Hive
>>>> scripts I'm believing is not essay  to  do in CLI mode all scripts .
>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>>
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> This is not as hard as it sounds. The hardest part is setting up the
>>>>> incremental query against your MySQL database. Then you can write the
>>>>> results to new files in the HDFS directory for the table and Hive will see
>>>>> them immediately. Yes, even though Hive doesn't support updates, it doesn't
>>>>> care how many files are in the directory. The trick is to avoid lots of
>>>>> little files.
>>>>>
>>>>> As others have suggested, you should consider partitioning the data,
>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>>> You'll need to add the partitions to the table as you go, but actually, you
>>>>> can add those once a month, for example, for all partitions. Hive doesn't
>>>>> care if the partition directories don't exist yet or the directories are
>>>>> empty. I also recommend using an external table, which gives you more
>>>>> flexibility on directory layout, etc.
>>>>>
>>>>> Sqoop might be the easiest tool for importing the data, as it will
>>>>> even generate a Hive table schema from the original MySQL table. However,
>>>>> that feature may not be useful in this case, as you already have the table.
>>>>>
>>>>> I think Oozie is horribly complex to use and overkill for this
>>>>> purpose. A simple bash script triggered periodically by cron is all you
>>>>> need. If you aren't using a partitioned table, you have a single sqoop
>>>>> command to run. If you have partitioned data, you'll also need a hive
>>>>> statement in the script to create the partition, unless you do those in
>>>>> batch once a month, etc., etc.
>>>>>
>>>>> Hope this helps,
>>>>> dean
>>>>>
>>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>>> run analytical queries and we are using sqoop to import data into hive, in
>>>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>>>> to hive. Hive does not support update/delete but there are many workarounds
>>>>>> to do this task.
>>>>>>
>>>>>> What's in our mind is importing all the tables into hive as is, then
>>>>>> we build the required tables for reporting.
+
Mohammad Tariq 2012-12-26, 14:52
+
Ibrahim Yakti 2012-12-26, 14:56
+
Mohammad Tariq 2012-12-24, 13:19
+
Ibrahim Yakti 2012-12-24, 13:30
+
Mohammad Tariq 2012-12-24, 13:35
+
Ibrahim Yakti 2012-12-24, 13:38
+
Mohammad Tariq 2012-12-24, 14:03
+
Ibrahim Yakti 2012-12-24, 14:08
+
Mohammad Tariq 2012-12-24, 14:25
+
Ibrahim Yakti 2012-12-24, 14:28
+
Jeremiah Peschka 2012-12-24, 14:22
+
Edward Capriolo 2012-12-24, 14:28
+
Mohammad Tariq 2012-12-24, 14:31
+
Ibrahim Yakti 2012-12-24, 14:29
+
Edward Capriolo 2012-12-24, 14:37
+
Ibrahim Yakti 2012-12-24, 14:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB