Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Using Cluster by to improve Group by Performance


Copy link to this message
-
Re: Using Cluster by to improve Group by Performance
>From perspective of RS, two query is just different in hash code of RS key.
The cost of calculating hash of col3 and col4 might be negligible, I think.

2013/11/1 KayVajj <[EMAIL PROTECTED]>

> Any response or pointers to understand how Cluster By in sub queries can
> affect the performance/speed of outer queries is helpful.
>
> Thanks
> Kay
>
>
> On Mon, Oct 28, 2013 at 1:17 PM, KayVajj <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I have a question if I could use the cluster by clause in a sub query to
>> improve the performance of a group by query in hive
>>
>> Lets I have a Table A with columns (all strings) col1..col5 and the table
>> is not "Clustered"
>>
>> now I 'm trying to run the below query
>>
>> select
>>> col1,
>>> col2,
>>> col3,
>>> col4,
>>> concat_ws(',', collect_set(col5))
>>> from A
>>> group by
>>> col1,
>>> col2,
>>> col3,
>>> col4
>>
>>
>>
>> Would the below query optimize the above query and if not what is the
>> best practice to optimize this query. Assuming only col1 & col2 are the
>> uniquely identifying columns
>>
>>
>>
>>
>> select
>>> ct.col1,
>>> ct.col2,
>>> ct.col3,
>>> ct.col4,
>>> concat_ws(',', collect_set(ct.col5))
>>> from
>>> (
>>> select
>>> col1,
>>> col2,
>>> col3,
>>> col4,
>>> col5
>>> from A
>>> cluster by col1, col2
>>> ) ct
>>> group by
>>> ct.col1,
>>> ct.col2,
>>> ct.col3,
>>> ct.col4.
>>
>>
>> Thanks for your responses.
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB