From: Saurabh Nanda [mailto:[EMAIL PROTECTED]]
Sent: Sunday, July 19, 2009 11:38 PM
To: [EMAIL PROTECTED]
Subject: Re: dense_rank() equivalent in Hive?
Is there any other way to approach this problem? If I can ensure that a particular user's (sorted) data is guaranteed to be processed on a single Hadoop node, then probably I can write a custom script to do the ranking for me.
I guess the answer to my query is given at http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy --
"Hive uses the columns in Distribute By to distribute the rows among reducers. All rows with the same Distribute By columns will go to the same reducer. Instead of specifying Cluster By, the user can specify Distribute By and Sort By, so the partition columns and sort columns can be different. The usual case is that the partition columns are a prefix of sort columns, but that is not required."