Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Re: Lag function in Hive


+
karanveer.singh@... 2012-04-10, 14:51
+
Butani, Harish 2012-04-10, 15:10
+
Ashutosh Chauhan 2012-04-11, 14:54
+
Butani, Harish 2012-04-11, 21:39
+
David Kulp 2012-04-10, 14:56
+
Philip Tromans 2012-04-10, 15:02
+
David Kulp 2012-04-10, 15:07
+
karanveer.singh@... 2012-04-10, 13:44
+
Philip Tromans 2012-04-10, 14:17
+
Hamilton, Robert 2012-04-10, 15:01
+
karanveer.singh@... 2012-04-11, 08:15
+
Mark Grover 2012-04-11, 13:31
+
karanveer.singh@... 2012-04-10, 14:37
+
David Kulp 2012-04-10, 14:45
+
karanveer.singh@... 2012-04-11, 05:43
Copy link to this message
-
Re: Lag function in Hive
does your table have column called "rownum"?

I think From Philip's mail, it was just an example

On Wed, Apr 11, 2012 at 11:13 AM, <[EMAIL PROTECTED]> wrote:

>
> When I try using rownum in my Hive QL query, I get: "Invalid column
> reference rownum". Am I missing something here?
>
> Regards,
> Karan
>
>
> -----Original Message-----
> From: David Kulp [mailto:[EMAIL PROTECTED]]
> Sent: 10 April 2012 20:15
> To: [EMAIL PROTECTED]
> Subject: Re: Lag function in Hive
>
> New here.  Hello all.
>
> Could you try a self-join, possibly also restricted to partitions?
>
> E.g. SELECT t2.value - t1.value FROM mytable t1, mytable t2 WHERE
> t1.rownum = t2.rownum+1 AND t1.partition=foo AND t2.partition=bar
>
> If your data is clustered by rownum, then this join should, in theory, be
> relatively fast -- especially if it makes sense to exploit partitions.
>
> -d
>
> On Apr 10, 2012, at 10:37 AM, <[EMAIL PROTECTED]> <
> [EMAIL PROTECTED]> wrote:
>
> > Makes sense but is not the distribution across nodes for a chunk of
> records in that order.
> >
> > If Hive cannot help me do this, is there another way I can do this? I
> tried generating an identifier using the perl script invoked using Hive but
> it does not seem to work fine. While the stand alone script works fine,
> when the record is created in hive using std output from perl - I see 2
> records for some of the unique identifiers. I explored the possibility of
> default data type changes but that does not solve the problem.
> >
> > Regards,
> > Karan
> >
> >
> > -----Original Message-----
> > From: Philip Tromans [mailto:[EMAIL PROTECTED]]
> > Sent: 10 April 2012 19:48
> > To: [EMAIL PROTECTED]
> > Subject: Re: Lag function in Hive
> >
> > Hi Karan,
> >
> > To the best of my knowledge, there isn't one. It's also unlikely to
> > happen because it's hard to parallelise in a map-reduce way (it
> > requires knowing where you are in a result set, and who your
> > neighbours are and they in turn need to be present on the same node as
> > you which is difficult to guarantee).
> >
> > Cheers,
> >
> > Phil.
> >
> > On 10 April 2012 14:44,  <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >>
> >> Is there something like a 'lag' function in HIVE? The requirement is to
> >> calculate difference for the same column for every 2 subsequent records.
> >>
> >> For example.
> >>
> >> Row, Column A, Column B
> >> 1, 10, 100
> >> 2, 20, 200
> >> 3, 30, 300
> >>
> >>
> >> The result that I need should be like:
> >>
> >> Row, Column A, Column B, Result
> >> 1, 10, 100, NULL
> >> 2, 20, 200, 100 (200-100)
> >> 3, 30, 300, 100 (300-200)
> >>
> >> Rgds,
> >> Karan
> >>
> >>
> >>
> >>
> >>
> >> This e-mail and any attachments are confidential and intended solely
> for the
> >> addressee and may also be privileged or exempt from disclosure under
> >> applicable law. If you are not the addressee, or have received this
> e-mail
> >> in error, please notify the sender immediately, delete it from your
> system
> >> and do not copy, disclose or otherwise act upon any part of this e-mail
> or
> >> its attachments.
> >>
> >> Internet communications are not guaranteed to be secure or virus-free.
> >> The Barclays Group does not accept responsibility for any loss arising
> from
> >> unauthorised access to, or interference with, any Internet
> communications by
> >> any third party, or from the transmission of any viruses. Replies to
> this
> >> e-mail may be monitored by the Barclays Group for operational or
> business
> >> reasons.
> >>
> >> Any opinion or other information in this e-mail or its attachments that
> does
> >> not relate to the business of the Barclays Group is personal to the
> sender
> >> and is not given or endorsed by the Barclays Group.
> >>
> >> Barclays Bank PLC.Registered in England and Wales (registered no.
> 1026167).
> >> Registered Office: 1 Churchill Place, London, E14 5HP, United Kingdom.
> >>
> >> Barclays Bank PLC is authorised and regulated by the Financial Services
Nitin Pawar
+
karanveer.singh@... 2012-04-11, 08:23