Hive, mail # user - Query regarding Hive Parallel Orderby - 2014-02-21, 04:32
Solr & Elasticsearch trainings in New York & San Francisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
Query regarding Hive Parallel Orderby

Hive 12 has added the functionality of parallel order by. I have a few
queries regarding the working of it.
From the source code I have figured out that to do a parallel orderby , a
partition table needs to created
which is provided as an input to TotalOrderPartitioner.  To create the
partition table, a sample of
the hive table is stored as ArrayList of byte arrays and then sorted.

So I have the following queries :

1)  Is my understanding correct?

2) Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?
Vaibhav Jain

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB