Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Reflect MySQL updates into Hive


Copy link to this message
-
Re: Reflect MySQL updates into Hive
Mohammad Tariq 2012-12-25, 05:59
Also, have a look at this :
http://www.catb.org/~esr/faqs/smart-questions.html

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Have a look at Beeswax.
>
> BTW, do you have access to Google at your station?Same question on the Pig
> mailing list as well, that too twice.
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Is there any Hive editors and where we can write 100 to 150 Hive scripts
>> I'm believing is not essay  to  do in CLI mode all scripts .
>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>
>>
>> Thanks
>>
>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>> [EMAIL PROTECTED]> wrote:
>>
>>> This is not as hard as it sounds. The hardest part is setting up the
>>> incremental query against your MySQL database. Then you can write the
>>> results to new files in the HDFS directory for the table and Hive will see
>>> them immediately. Yes, even though Hive doesn't support updates, it doesn't
>>> care how many files are in the directory. The trick is to avoid lots of
>>> little files.
>>>
>>> As others have suggested, you should consider partitioning the data,
>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>> You'll need to add the partitions to the table as you go, but actually, you
>>> can add those once a month, for example, for all partitions. Hive doesn't
>>> care if the partition directories don't exist yet or the directories are
>>> empty. I also recommend using an external table, which gives you more
>>> flexibility on directory layout, etc.
>>>
>>> Sqoop might be the easiest tool for importing the data, as it will even
>>> generate a Hive table schema from the original MySQL table. However, that
>>> feature may not be useful in this case, as you already have the table.
>>>
>>> I think Oozie is horribly complex to use and overkill for this purpose.
>>> A simple bash script triggered periodically by cron is all you need. If you
>>> aren't using a partitioned table, you have a single sqoop command to run.
>>> If you have partitioned data, you'll also need a hive statement in the
>>> script to create the partition, unless you do those in batch once a month,
>>> etc., etc.
>>>
>>> Hope this helps,
>>> dean
>>>
>>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We are new to hadoop and hive, we are trying to use hive to
>>>> run analytical queries and we are using sqoop to import data into hive, in
>>>> our RDBMS the data updated very frequently and this needs to be reflected
>>>> to hive. Hive does not support update/delete but there are many workarounds
>>>> to do this task.
>>>>
>>>> What's in our mind is importing all the tables into hive as is, then we
>>>> build the required tables for reporting.
>>>>
>>>> My questions are:
>>>>
>>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>>    minimal resources?
>>>>    2. Is sqoop the right tool to do the ETL?
>>>>    3. Is Hive the right tool to do this kind of queries or we should
>>>>    search for alternatives?
>>>>
>>>> Any hint will be useful, thanks in advanced.
>>>>
>>>> --
>>>> Ibrahim
>>>>
>>>
>>>
>>>
>>> --
>>> *Dean Wampler, Ph.D.*
>>> thinkbiganalytics.com
>>> +1-312-339-1330
>>>
>>>
>>
>