Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Performance tuning a hive query


Copy link to this message
-
Re: Performance tuning a hive query
Couple to add to the list:

Indexing[1]
Columnar Storage/RCFile[2]

[1] https://cwiki.apache.org/confluence/display/Hive/IndexDev
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf

On Thu, Jul 19, 2012 at 8:39 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:

> There are many ways, but beware that some of them may result in worse
> performance when used inappropriately.
>
> Some of the settings we use to achieve faster queries:
> hive.map.aggr=true
> hive.exec.parallel=true
> hive.exec.compress.intermediate=true
> mapred.job.reuse.jvm.num.tasks=-1
>
> Structuring the queries properly can help a lot. For example if you
> eliminate unneeded data early in the query before further processing. E.g.
> if you use subquery in FROM, you should put all WHERE clauses where
> possible into the subquery, to eliminate the amount of data passed to the
> next stage.
>
> Using multi-group-by queries helps a lot when computing multiple queries
> on same set of data.
>
> As Nitin Pawar mentioned, the JOINs can be often optimized as well.
>
> Also, fine tuning the hadoop server itself for your specific needs might
> help.
>
> I am very interested in optimization of queries as well, so if anyone
> knows some more tricks, please share...
>
> J. Dolinar
>
>
>
> On Thu, Jul 19, 2012 at 3:24 PM, Abhishek <[EMAIL PROTECTED]>wrote:
>
>>
>> Apart from partitions and buckets how to improve of hive queries
>> *
>> *
>> *Regards
>> *
>> Abhi
>> Sent from my iPhone
>>
>
>
--
Swarnim