Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Performance tuning a hive query


Copy link to this message
-
Re: Performance tuning a hive query
Couple to add to the list:

Indexing[1]
Columnar Storage/RCFile[2]

[1] https://cwiki.apache.org/confluence/display/Hive/IndexDev
[2]
http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-4.pdf

On Thu, Jul 19, 2012 at 8:39 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:

> There are many ways, but beware that some of them may result in worse
> performance when used inappropriately.
>
> Some of the settings we use to achieve faster queries:
> hive.map.aggr=true
> hive.exec.parallel=true
> hive.exec.compress.intermediate=true
> mapred.job.reuse.jvm.num.tasks=-1
>
> Structuring the queries properly can help a lot. For example if you
> eliminate unneeded data early in the query before further processing. E.g.
> if you use subquery in FROM, you should put all WHERE clauses where
> possible into the subquery, to eliminate the amount of data passed to the
> next stage.
>
> Using multi-group-by queries helps a lot when computing multiple queries
> on same set of data.
>
> As Nitin Pawar mentioned, the JOINs can be often optimized as well.
>
> Also, fine tuning the hadoop server itself for your specific needs might
> help.
>
> I am very interested in optimization of queries as well, so if anyone
> knows some more tricks, please share...
>
> J. Dolinar
>
>
>
> On Thu, Jul 19, 2012 at 3:24 PM, Abhishek <[EMAIL PROTECTED]>wrote:
>
>>
>> Apart from partitions and buckets how to improve of hive queries
>> *
>> *
>> *Regards
>> *
>> Abhi
>> Sent from my iPhone
>>
>
>
--
Swarnim
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB