Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Reflect MySQL updates into Hive


Copy link to this message
-
Re: Reflect MySQL updates into Hive
Ibrahim Yakti 2012-12-26, 14:56
Thanks Mohammad, I will be waiting ... meanwhile, seems I will get into
HBase and give it a try ... unless someone advised with something
better/easier.
--
Ibrahim
On Wed, Dec 26, 2012 at 5:52 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> Hello Ibrahim,
>
>            Sorry for the late response. Those replies were for Kshiva. I
> saw his question(exactly same as this one) multiple times on Pig mailing
> list as well, so just thought of giving some pointers to him on how to use
> the list. I should have specified it properly. Apologies for creating the
> nuisance.
>
> Coming back to the actual point, yes the flow is fine. Normally people do
> it like this. But I was looking for some alternate way, so that we don't
> have to go through this long process for the updates. I'll let you know
> once I find something useful. But till now I haven't found anything better
> than whatever Dean sir has suggested. Please, do let me know if you find
> something before me.
>
> Many thanks.
>
>
> Best Regards,
> Tariq
> +91-9741563634
> https://mtariq.jux.com/
>
>
> On Wed, Dec 26, 2012 at 7:24 PM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>
>> After more reading, a suggested scenario looks like:
>>
>> MySQL ---(Extract / Load)---> HDFS ---> Load into HBase --> Read as
>> external in Hive ---(Transform Data & Join Tables)--> Use hive for Joins &
>> Queries ---> Update HBase as needed & Reload in Hive.
>>
>> What do you think please?
>>
>>
>>
>> --
>> Ibrahim
>>
>>
>> On Wed, Dec 26, 2012 at 9:27 AM, Ibrahim Yakti <[EMAIL PROTECTED]> wrote:
>>
>>> Mohammad, I am not sure if the answers & the link were to me or to
>>> Kshiva's question.
>>>
>>> if I have partitioned my data based on status for example, when I run
>>> the update query it will add the updated data on a new partition (success
>>> or shipped for example) and it will keep the old data (confirmed or paid
>>> for example), right?
>>>
>>>
>>> --
>>> Ibrahim
>>>
>>>
>>> On Tue, Dec 25, 2012 at 8:59 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>
>>>> Also, have a look at this :
>>>> http://www.catb.org/~esr/faqs/smart-questions.html
>>>>
>>>> Best Regards,
>>>> Tariq
>>>> +91-9741563634
>>>> https://mtariq.jux.com/
>>>>
>>>>
>>>> On Tue, Dec 25, 2012 at 11:26 AM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Have a look at Beeswax.
>>>>>
>>>>> BTW, do you have access to Google at your station?Same question on the
>>>>> Pig mailing list as well, that too twice.
>>>>>
>>>>> Best Regards,
>>>>> Tariq
>>>>> +91-9741563634
>>>>> https://mtariq.jux.com/
>>>>>
>>>>>
>>>>> On Tue, Dec 25, 2012 at 11:20 AM, Kshiva Kps <[EMAIL PROTECTED]>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is there any Hive editors and where we can write 100 to 150 Hive
>>>>>> scripts I'm believing is not essay  to  do in CLI mode all scripts .
>>>>>> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Mon, Dec 24, 2012 at 8:21 PM, Dean Wampler <
>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> This is not as hard as it sounds. The hardest part is setting up the
>>>>>>> incremental query against your MySQL database. Then you can write the
>>>>>>> results to new files in the HDFS directory for the table and Hive will see
>>>>>>> them immediately. Yes, even though Hive doesn't support updates, it doesn't
>>>>>>> care how many files are in the directory. The trick is to avoid lots of
>>>>>>> little files.
>>>>>>>
>>>>>>> As others have suggested, you should consider partitioning the data,
>>>>>>> perhaps by time. Say you import about a few HDFS blocks-worth of data each
>>>>>>> day, then use year/month/day partitioning to speed up your Hive queries.
>>>>>>> You'll need to add the partitions to the table as you go, but actually, you
>>>>>>> can add those once a month, for example, for all partitions. Hive doesn't
>>>>>>> care if the partition directories don't exist yet or the directories are
>>>>>>> empty. I also recommend using an external table, which gives you more