Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Best practice for automating jobs


+
Tom Brown 2013-01-10, 22:03
+
Qiang Wang 2013-01-11, 01:31
+
Tom Brown 2013-01-11, 02:55
+
Qiang Wang 2013-01-11, 03:06
+
Tom Brown 2013-01-11, 03:17
Copy link to this message
-
Re: Best practice for automating jobs
Are you using Embedded Metastore ?
Only one process can connect to this metastore at a time.
2013/1/11 Tom Brown <[EMAIL PROTECTED]>

> When I've tried to create concurrent CLI sessions, I thought the 2nd
> one got an error about not being able to lock the metadata store.
>
> Is that error a real thing, or have I been mistaken this whole time?
>
> --Tom
>
>
> On Thursday, January 10, 2013, Qiang Wang wrote:
>
>> The HWI will create a cli session for each query through hive libs, so
>> several queries can run concurrently.
>>
>>
>> 2013/1/11 Tom Brown <[EMAIL PROTECTED]>
>>
>>> How is concurrency achieved with this solution?
>>>
>>>
>>> On Thursday, January 10, 2013, Qiang Wang wrote:
>>>
>>>> I believe the HWI (Hive Web Interface) can give you a hand.
>>>>
>>>> https://github.com/anjuke/hwi
>>>>
>>>> You can use the HWI to submit and run queries concurrently.
>>>> Partition management can be achieved by creating crontabs using the HWI.
>>>>
>>>> It's simple and easy to use. Hope it helps.
>>>>
>>>> Regards,
>>>> Qiang
>>>>
>>>>
>>>> 2013/1/11 Tom Brown <[EMAIL PROTECTED]>
>>>>
>>>>> All,
>>>>>
>>>>> I want to automate jobs against Hive (using an external table with
>>>>> ever growing partitions), and I'm running into a few challenges:
>>>>>
>>>>> Concurrency - If I run Hive as a thrift server, I can only safely run
>>>>> one job at a time. As such, it seems like my best bet will be to run
>>>>> it from the command line and setup a brand new instance for each job.
>>>>> That quite a bit of a hassle to solves a seemingly common problem, so
>>>>> I want to know if there are any accepted patterns or best practices
>>>>> for this?
>>>>>
>>>>> Partition management - New partitions will be added regularly. If I
>>>>> have to setup multiple instances of Hive for each (potentially)
>>>>> overlapping job, it will be difficult to keep track of the partitions
>>>>> that have been added. In the context of the preceding question, what
>>>>> is the best way to add metadata about new partitions?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> --Tom
>>>>>
>>>>
>>>>
>>
+
Sean McNamara 2013-01-10, 22:11
+
Dean Wampler 2013-01-10, 22:30
+
Alexander Alten-Lorenz 2013-01-11, 07:23
+
Manish Malhotra 2013-01-11, 18:56
+
Tom Brown 2013-01-11, 22:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB