Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - ROW_NUMBER() equivalent in Hive


Copy link to this message
-
Re: ROW_NUMBER() equivalent in Hive
Ashutosh Chauhan 2013-02-22, 01:44
Hi Stephen,

As I indicated in my previous email, check out file  ql/src/test/queries/
clientpositive/ptf_general_queries.q it has plenty of example queries
demonstrating the functionality which is available. If you are interested
in hive src changes which has enabled this feature.. you may want to start
by looking at a patch attached on HIVE-896 which was the starting point for
this work. That jira also has links with other jira which we did /are doing
on top of that patch.

Hope it helps,
Ashutosh

On Thu, Feb 21, 2013 at 12:17 PM, Stephen Boesch <[EMAIL PROTECTED]> wrote:

> Hi Ashutosh,
>    I am interested / reviewing your windowing feature.  Can you be more
> specific about which (a) tests and (b) src files constitute your additions
> (there are lots of files there ;)  )
>
> thanks
>
> stephen boesch
>
>
> 2013/2/21 Ashutosh Chauhan <[EMAIL PROTECTED]>
>
>> Kumar,
>>
>> If you are willing to be on bleeding edge, this and many other
>> partitioning and windowing functionality some of us are developing in a
>> branch over at:
>> https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
>> Check out this branch, build hive and than you can have row_number()
>> functionality. Look in
>> ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
>> or so example queries demonstrating various capabilities which we have
>> already working (including row_number).
>> We hope to have this branch merged in trunk soon.
>>
>> Hope it helps,
>> Ashutosh
>> On Wed, Feb 20, 2013 at 11:33 PM, kumar mr <[EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>>  This is Kumar, and this is my first question in this group.
>>>
>>>  I have a requirement to implement ROW_NUMBER() from Teradata in Hive
>>> where partitioning happens on multiple columns along with multiple column
>>> ordering.
>>> It can be easily implemented in Hadoop MR, but I have to do in Hive. By
>>> doing in UDF can assign same rank to grouping key considering dataset is
>>> small, but ordering need to be done in prior step.
>>> Can we do this in lot simpler way?
>>>
>>>  Thanks in advance.
>>>
>>>  Regards,
>>> Kumar
>>>
>>
>>
>