Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Performance tuning a hive query


Copy link to this message
-
Re: Performance tuning a hive query
Jan Dolinár 2012-07-19, 13:39
There are many ways, but beware that some of them may result in worse
performance when used inappropriately.

Some of the settings we use to achieve faster queries:
hive.map.aggr=true
hive.exec.parallel=true
hive.exec.compress.intermediate=true
mapred.job.reuse.jvm.num.tasks=-1

Structuring the queries properly can help a lot. For example if you
eliminate unneeded data early in the query before further processing. E.g.
if you use subquery in FROM, you should put all WHERE clauses where
possible into the subquery, to eliminate the amount of data passed to the
next stage.

Using multi-group-by queries helps a lot when computing multiple queries on
same set of data.

As Nitin Pawar mentioned, the JOINs can be often optimized as well.

Also, fine tuning the hadoop server itself for your specific needs might
help.

I am very interested in optimization of queries as well, so if anyone knows
some more tricks, please share...

J. Dolinar

On Thu, Jul 19, 2012 at 3:24 PM, Abhishek <[EMAIL PROTECTED]> wrote:

>
> Apart from partitions and buckets how to improve of hive queries
> *
> *
> *Regards
> *
> Abhi
> Sent from my iPhone
>