Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Best practice for automating jobs


+
Tom Brown 2013-01-10, 22:03
+
Qiang Wang 2013-01-11, 01:31
+
Tom Brown 2013-01-11, 02:55
+
Qiang Wang 2013-01-11, 03:06
+
Tom Brown 2013-01-11, 03:17
+
Qiang Wang 2013-01-11, 03:22
+
Sean McNamara 2013-01-10, 22:11
+
Dean Wampler 2013-01-10, 22:30
+
Alexander Alten-Lorenz 2013-01-11, 07:23
Copy link to this message
-
Re: Best practice for automating jobs
Manish Malhotra 2013-01-11, 18:56
When you are using Cli library ... it internally uses ZK or configured /
support locking service, so no extra effort is required to do that.

Though there is a patch for hiveserver leak zookeeper HIVE-3723 , which
people are trying on 0.9 and 0.10.

Regards,
Manish
On Thu, Jan 10, 2013 at 11:23 PM, Alexander Alten-Lorenz <
[EMAIL PROTECTED]> wrote:

> +1
>
> This is the best solution to automate jobs.
>
> cheers,
>  Alex
>
> On Jan 10, 2013, at 11:11 PM, Sean McNamara <[EMAIL PROTECTED]>
> wrote:
>
> >> I want to know if there are any accepted patterns or best practices for
> >> this?
> >
> > http://oozie.apache.org/
> >
> >
> >
> >> New partitions will be added regularly
> >
> > What type of partitions are you adding? Why frequently?
> >
> >
> >
> >
> > Sean
> >
> >
> > On 1/10/13 3:03 PM, "Tom Brown" <[EMAIL PROTECTED]> wrote:
> >
> >> All,
> >>
> >> I want to automate jobs against Hive (using an external table with
> >> ever growing partitions), and I'm running into a few challenges:
> >>
> >> Concurrency - If I run Hive as a thrift server, I can only safely run
> >> one job at a time. As such, it seems like my best bet will be to run
> >> it from the command line and setup a brand new instance for each job.
> >> That quite a bit of a hassle to solves a seemingly common problem, so
> >> I want to know if there are any accepted patterns or best practices
> >> for this?
> >>
> >> Partition management - New partitions will be added regularly. If I
> >> have to setup multiple instances of Hive for each (potentially)
> >> overlapping job, it will be difficult to keep track of the partitions
> >> that have been added. In the context of the preceding question, what
> >> is the best way to add metadata about new partitions?
> >>
> >> Thanks in advance!
> >>
> >> --Tom
> >
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>
+
Tom Brown 2013-01-11, 22:58