Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive sort by using a single reducer


+
Binesh Gummadi 2012-09-02, 17:53
Copy link to this message
-
Re: Hive sort by using a single reducer
Sort by does not have the single reduce restriction. Not sure which rank
you are using but any one should allow you to sort and rank if the query is
written correctly. Our implementation on my
github.com/edwardcaprioloallows this.
On Sunday, September 2, 2012, Binesh Gummadi <[EMAIL PROTECTED]>
wrote:
> I am trying to insert data into a table after selecting and sorting by a
column. What I really want is order by a column and select the top million
rows. I am using Amazon EMR hive cloud to process data.
> Here is my query
> INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc
LIMIT 1000000;
> It creates two jobs. First job run rather quickly and second job reducer
is running forever as it is running with a single reducer. Here is my
question on stackoverflow(
http://stackoverflow.com/questions/12233343/why-is-sort-by-always-using-single-reducer
).
> According to docs "order by" clause has a limitation of 1 reducer. Does
sort by has same limitation? Are there any other ways of solving the above
requirement?
> ________________________________
> Binesh Gummadi
>
>
+
Binesh Gummadi 2012-09-02, 21:38
+
Ruslan Al-Fakikh 2012-09-04, 18:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB