Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> dense_rank() equivalent in Hive?

Copy link to this message
RE: dense_rank() equivalent in Hive?

From: Saurabh Nanda [mailto:[EMAIL PROTECTED]]
Sent: Sunday, July 19, 2009 11:38 PM
Subject: Re: dense_rank() equivalent in Hive?
Is there any other way to approach this problem? If I can ensure that a particular user's (sorted) data is guaranteed to be processed on a single Hadoop node, then probably I can write a custom script to do the ranking for me.

I guess the answer to my query is given at http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy --

"Hive uses the columns in Distribute By to distribute the rows among reducers. All rows with the same Distribute By columns will go to the same reducer. Instead of specifying Cluster By, the user can specify Distribute By and Sort By, so the partition columns and sort columns can be different. The usual case is that the partition columns are a prefix of sort columns, but that is not required."