Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Best practice for automating jobs


+
Tom Brown 2013-01-10, 22:03
+
Qiang Wang 2013-01-11, 01:31
+
Tom Brown 2013-01-11, 02:55
+
Qiang Wang 2013-01-11, 03:06
+
Tom Brown 2013-01-11, 03:17
Copy link to this message
-
Re: Best practice for automating jobs
Are you using Embedded Metastore ?
Only one process can connect to this metastore at a time.
2013/1/11 Tom Brown <[EMAIL PROTECTED]>

> When I've tried to create concurrent CLI sessions, I thought the 2nd
> one got an error about not being able to lock the metadata store.
>
> Is that error a real thing, or have I been mistaken this whole time?
>
> --Tom
>
>
> On Thursday, January 10, 2013, Qiang Wang wrote:
>
>> The HWI will create a cli session for each query through hive libs, so
>> several queries can run concurrently.
>>
>>
>> 2013/1/11 Tom Brown <[EMAIL PROTECTED]>
>>
>>> How is concurrency achieved with this solution?
>>>
>>>
>>> On Thursday, January 10, 2013, Qiang Wang wrote:
>>>
>>>> I believe the HWI (Hive Web Interface) can give you a hand.
>>>>
>>>> https://github.com/anjuke/hwi
>>>>
>>>> You can use the HWI to submit and run queries concurrently.
>>>> Partition management can be achieved by creating crontabs using the HWI.
>>>>
>>>> It's simple and easy to use. Hope it helps.
>>>>
>>>> Regards,
>>>> Qiang
>>>>
>>>>
>>>> 2013/1/11 Tom Brown <[EMAIL PROTECTED]>
>>>>
>>>>> All,
>>>>>
>>>>> I want to automate jobs against Hive (using an external table with
>>>>> ever growing partitions), and I'm running into a few challenges:
>>>>>
>>>>> Concurrency - If I run Hive as a thrift server, I can only safely run
>>>>> one job at a time. As such, it seems like my best bet will be to run
>>>>> it from the command line and setup a brand new instance for each job.
>>>>> That quite a bit of a hassle to solves a seemingly common problem, so
>>>>> I want to know if there are any accepted patterns or best practices
>>>>> for this?
>>>>>
>>>>> Partition management - New partitions will be added regularly. If I
>>>>> have to setup multiple instances of Hive for each (potentially)
>>>>> overlapping job, it will be difficult to keep track of the partitions
>>>>> that have been added. In the context of the preceding question, what
>>>>> is the best way to add metadata about new partitions?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> --Tom
>>>>>
>>>>
>>>>
>>
+
Sean McNamara 2013-01-10, 22:11
+
Dean Wampler 2013-01-10, 22:30
+
Alexander Alten-Lorenz 2013-01-11, 07:23
+
Manish Malhotra 2013-01-11, 18:56
+
Tom Brown 2013-01-11, 22:58