Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Reflect MySQL updates into Hive


+
Ibrahim Yakti 2012-12-24, 13:08
+
Dean Wampler 2012-12-24, 14:51
+
Ibrahim Yakti 2012-12-24, 15:34
+
Dean Wampler 2012-12-24, 18:12
+
Ibrahim Yakti 2012-12-24, 18:25
+
Kshiva Kps 2012-12-25, 05:50
+
Mohammad Tariq 2012-12-25, 05:56
+
Mohammad Tariq 2012-12-25, 05:59
Copy link to this message
-
Re: Reflect MySQL updates into Hive
Mohammad, I am not sure if the answers & the link were to me or to Kshiva's
question.

if I have partitioned my data based on status for example, when I run the
update query it will add the updated data on a new partition (success or
shipped for example) and it will keep the old data (confirmed or paid for
example), right?
--
Ibrahim
On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Also, have a look at this :
> http://www.catb.org/~esr/faqs/smart-questions.html
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>
>> Have a look at Beeswax.
>>
>> BTW, do you have access to Google at your station?Same question on the
>> Pig mailing list as well, that too twice.
>>
>> Best Regards,
>> Tariq
>> +91-9741563634
>> https://mtariq.jux.com/
>>
>>
>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> Is there any Hive editors and where we can write 100 to 150 Hive scripts
>>> I'm believing is not essay  to  do in CLI mode all scripts .
>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>
>>>
>>> Thanks
>>>
>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> This is not as hard as it sounds. The hardest part is setting up the
>>>> incremental query against your MySQL database. Then you can write the
>>>> results to new files in the HDFS directory for the table and Hive will see
>>>> them immediately. Yes, even though Hive doesn't support updates, it doesn't
>>>> care how many files are in the directory. The trick is to avoid lots of
>>>> little files.
>>>>
>>>> As others have suggested, you should consider partitioning the data,
>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>> You'll need to add the partitions to the table as you go, but actually, you
>>>> can add those once a month, for example, for all partitions. Hive doesn't
>>>> care if the partition directories don't exist yet or the directories are
>>>> empty. I also recommend using an external table, which gives you more
>>>> flexibility on directory layout, etc.
>>>>
>>>> Sqoop might be the easiest tool for importing the data, as it will even
>>>> generate a Hive table schema from the original MySQL table. However, that
>>>> feature may not be useful in this case, as you already have the table.
>>>>
>>>> I think Oozie is horribly complex to use and overkill for this purpose.
>>>> A simple bash script triggered periodically by cron is all you need. If you
>>>> aren't using a partitioned table, you have a single sqoop command to run.
>>>> If you have partitioned data, you'll also need a hive statement in the
>>>> script to create the partition, unless you do those in batch once a month,
>>>> etc., etc.
>>>>
>>>> Hope this helps,
>>>> dean
>>>>
>>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We are new to hadoop and hive, we are trying to use hive to
>>>>> run analytical queries and we are using sqoop to import data into hive, in
>>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>>> to hive. Hive does not support update/delete but there are many workarounds
>>>>> to do this task.
>>>>>
>>>>> What's in our mind is importing all the tables into hive as is, then
>>>>> we build the required tables for reporting.
>>>>>
>>>>> My questions are:
>>>>>
>>>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>>>    minimal resources?
>>>>>    2. Is sqoop the right tool to do the ETL?
>>>>>    3. Is Hive the right tool to do this kind of queries or we should
>>>>>    search for alternatives?
>>>>>
>>>>> Any hint will be useful, thanks in advanced.
>>>>>
>>>>> --
>>>>> Ibrahim
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Dean Wampler, Ph.D.*
>
+
Ibrahim Yakti 2012-12-26, 13:54
+
Mohammad Tariq 2012-12-26, 14:52
+
Ibrahim Yakti 2012-12-26, 14:56
+
Mohammad Tariq 2012-12-24, 13:19
+
Ibrahim Yakti 2012-12-24, 13:30
+
Mohammad Tariq 2012-12-24, 13:35
+
Ibrahim Yakti 2012-12-24, 13:38
+
Mohammad Tariq 2012-12-24, 14:03
+
Ibrahim Yakti 2012-12-24, 14:08
+
Mohammad Tariq 2012-12-24, 14:25
+
Ibrahim Yakti 2012-12-24, 14:28
+
Jeremiah Peschka 2012-12-24, 14:22
+
Edward Capriolo 2012-12-24, 14:28
+
Mohammad Tariq 2012-12-24, 14:31
+
Ibrahim Yakti 2012-12-24, 14:29
+
Edward Capriolo 2012-12-24, 14:37
+
Ibrahim Yakti 2012-12-24, 14:41