Hive 12 has added the functionality of parallel order by. I have a few queries regarding the working of it. From the source code I have figured out that to do a parallel orderby , a partition table needs to created which is provided as an input to TotalOrderPartitioner. To create the partition table, a sample of the hive table is stored as ArrayList of byte arrays and then sorted.
So I have the following queries :
1) Is my understanding correct?
2) Isn't it a possibility that storing the entire sample in memory would become a bottleneck when the sample size is large? Thanks Vaibhav Jain
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext