Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering


Copy link to this message
-
Re: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering
Nitin Pawar 2012-04-01, 18:45
Anand,

best place to understand the join queries on hive is from the presentation
by Namit Jain from Facebook.

Here is the pdf
https://cwiki.apache.org/Hive/presentations.data/Hive%20Summit%202011-join.pdf

you can search the video on youtube. Its very well described

On Sun, Apr 1, 2012 at 11:59 PM, Ladda, Anand <[EMAIL PROTECTED]>wrote:

>  I am trying to understand what are some of the options/settings
> available to tune the performance of Hive Queries. I have seen the benefits
> of Map side joins and Partitioning/Clustering. However I have yet to
> realize the impact map side aggregation has on query performance. I tried
> running this query against with and without map-side join turned on and did
> not see much difference in the execution times. The raw data in this
> partition is about 5.5 million. Looking for some pointers to see what type
> of queries benefit from Map-side aggregation****
>
> ** **
>
> set hive.auto.convert.join=false;****
>
> set hive.map.aggr=false;****
>
> Non-partitioned, non-clustered single table with where clause on date and
> no map side aggregation****
>
> select a11.emp_id, count(1), count (distinct a11.customer_id),
> sum(a11.qty_sold) from orderdetailrcfile a11 where order_date ='01-01-2008'
> group by a11.emp_id;****
>
> 400 secs****
>
> set hive.map.aggr=true;****
>
> Non-partitioned, non-clustered single table with where clause with where
> clause on date and map side aggregation****
>
> select a11.emp_id, count(1), count (distinct a11.customer_id),
> sum(a11.qty_sold) from orderdetailrcfile a11 where order_date ='01-01-2008'
> group by a11.emp_id;****
>
> 390 secs****
>
> ** **
>
> Also is there any reason to not turn on map-side joins all the time. In my
> tests I have always seen the performance either be the same or improve with
> map-side joins turned on. Are there any other parameters or Hive features
> that can help improve the performance of Hive queries. ****
>
> Thanks****
>
> Anand****
>
> ** **
>

--
Nitin Pawar