Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Query regarding Hive Parallel Orderby


Copy link to this message
-
Query regarding Hive Parallel Orderby
Hi,

Hive 12 has added the functionality of parallel order by. I have a few
queries regarding the working of it.
From the source code I have figured out that to do a parallel orderby , a
partition table needs to created
which is provided as an input to TotalOrderPartitioner.  To create the
partition table, a sample of
the hive table is stored as ArrayList of byte arrays and then sorted.

So I have the following queries :

1)  Is my understanding correct?

2) Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?
Thanks
Vaibhav Jain

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB