Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Reflect MySQL updates into Hive

Copy link to this message
Re: Reflect MySQL updates into Hive
Have a look at Beeswax.

BTW, do you have access to Google at your station?Same question on the Pig
mailing list as well, that too twice.

Best Regards,
On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[EMAIL PROTECTED]> wrote:

> Hi,
> Is there any Hive editors and where we can write 100 to 150 Hive scripts
> I'm believing is not essay  to  do in CLI mode all scripts .
> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
> Thanks
> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>> This is not as hard as it sounds. The hardest part is setting up the
>> incremental query against your MySQL database. Then you can write the
>> results to new files in the HDFS directory for the table and Hive will see
>> them immediately. Yes, even though Hive doesn't support updates, it doesn't
>> care how many files are in the directory. The trick is to avoid lots of
>> little files.
>> As others have suggested, you should consider partitioning the data,
>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>> day, then use year/month/day partitioning to speed up your Hive queries.
>> You'll need to add the partitions to the table as you go, but actually, you
>> can add those once a month, for example, for all partitions. Hive doesn't
>> care if the partition directories don't exist yet or the directories are
>> empty. I also recommend using an external table, which gives you more
>> flexibility on directory layout, etc.
>> Sqoop might be the easiest tool for importing the data, as it will even
>> generate a Hive table schema from the original MySQL table. However, that
>> feature may not be useful in this case, as you already have the table.
>> I think Oozie is horribly complex to use and overkill for this purpose. A
>> simple bash script triggered periodically by cron is all you need. If you
>> aren't using a partitioned table, you have a single sqoop command to run.
>> If you have partitioned data, you'll also need a hive statement in the
>> script to create the partition, unless you do those in batch once a month,
>> etc., etc.
>> Hope this helps,
>> dean
>> On Mon, Dec 24, 2012 at 7:08 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>> Hi All,
>>> We are new to hadoop and hive, we are trying to use hive to
>>> run analytical queries and we are using sqoop to import data into hive, in
>>> our RDBMS the data updated very frequently and this needs to be reflected
>>> to hive. Hive does not support update/delete but there are many workarounds
>>> to do this task.
>>> What's in our mind is importing all the tables into hive as is, then we
>>> build the required tables for reporting.
>>> My questions are:
>>>    1. What is the best way to reflect MySQL updates into Hive with
>>>    minimal resources?
>>>    2. Is sqoop the right tool to do the ETL?
>>>    3. Is Hive the right tool to do this kind of queries or we should
>>>    search for alternatives?
>>> Any hint will be useful, thanks in advanced.
>>> --
>>> Ibrahim
>> --
>> *Dean Wampler, Ph.D.*
>> thinkbiganalytics.com
>> +1-312-339-1330