Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Reflect MySQL updates into Hive

Copy link to this message
Re: Reflect MySQL updates into Hive
Hello Ibrahim,

           Sorry for the late response. Those replies were for Kshiva. I
saw his question(exactly same as this one) multiple times on Pig mailing
list as well, so just thought of giving some pointers to him on how to use
the list. I should have specified it properly. Apologies for creating the

Coming back to the actual point, yes the flow is fine. Normally people do
it like this. But I was looking for some alternate way, so that we don't
have to go through this long process for the updates. I'll let you know
once I find something useful. But till now I haven't found anything better
than whatever Dean sir has suggested. Please, do let me know if you find
something before me.

Many thanks.
Best Regards,
On Wed, Dec 26, 2012 at 7:24 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:

> After more reading, a suggested scenario looks like:
> MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
> external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
> Queries ---> Update HBase as needed & Reload in Hive.
> What do you think please?
> --
> Ibrahim
> On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>> Mohammad, I am not sure if the answers & the link were to me or to
>> Kshiva's question.
>> if I have partitioned my data based on status for example, when I run the
>> update query it will add the updated data on a new partition (success or
>> shipped for example) and it will keep the old data (confirmed or paid for
>> example), right?
>> --
>> Ibrahim
>> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>> Also, have a look at this :
>>> http://www.catb.org/~esr/faqs/smart-questions.html
>>> Best Regards,
>>> Tariq
>>> +91-9741563634
>>> https://mtariq.jux.com/
>>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>> Have a look at Beeswax.
>>>> BTW, do you have access to Google at your station?Same question on the
>>>> Pig mailing list as well, that too twice.
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[EMAIL PROTECTED]>wrote:
>>>>> Hi,
>>>>> Is there any Hive editors and where we can write 100 to 150 Hive
>>>>> scripts I'm believing is not essay  to  do in CLI mode all scripts .
>>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>>> Thanks
>>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>> This is not as hard as it sounds. The hardest part is setting up the
>>>>>> incremental query against your MySQL database. Then you can write the
>>>>>> results to new files in the HDFS directory for the table and Hive will see
>>>>>> them immediately. Yes, even though Hive doesn't support updates, it doesn't
>>>>>> care how many files are in the directory. The trick is to avoid lots of
>>>>>> little files.
>>>>>> As others have suggested, you should consider partitioning the data,
>>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>>>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>>>> You'll need to add the partitions to the table as you go, but actually, you
>>>>>> can add those once a month, for example, for all partitions. Hive doesn't
>>>>>> care if the partition directories don't exist yet or the directories are
>>>>>> empty. I also recommend using an external table, which gives you more
>>>>>> flexibility on directory layout, etc.
>>>>>> Sqoop might be the easiest tool for importing the data, as it will
>>>>>> even generate a Hive table schema from the original MySQL table. However,
>>>>>> that feature may not be useful in this case, as you already have the table.
>>>>>> I think Oozie is horribly complex to use and overkill for this