Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Re: Lag function in Hive


+
karanveer.singh@... 2012-04-10, 14:51
+
Butani, Harish 2012-04-10, 15:10
+
Ashutosh Chauhan 2012-04-11, 14:54
+
Butani, Harish 2012-04-11, 21:39
+
David Kulp 2012-04-10, 14:56
+
Philip Tromans 2012-04-10, 15:02
+
David Kulp 2012-04-10, 15:07
+
karanveer.singh@... 2012-04-10, 13:44
+
Philip Tromans 2012-04-10, 14:17
+
Hamilton, Robert 2012-04-10, 15:01
+
karanveer.singh@... 2012-04-11, 08:15
+
Mark Grover 2012-04-11, 13:31
+
karanveer.singh@... 2012-04-10, 14:37
+
David Kulp 2012-04-10, 14:45
+
karanveer.singh@... 2012-04-11, 05:43
Copy link to this message
-
Re: Lag function in Hive
does your table have column called "rownum"?

I think From Philip's mail, it was just an example

On Wed, Apr 11, 2012 at 11:13 AM, <[EMAIL PROTECTED]> wrote:

>
> When I try using rownum in my Hive QL query, I get: "Invalid column
> reference rownum". Am I missing something here?
>
> Regards,
> Karan
>
>
> -----Original Message-----
> From: David Kulp [mailto:[EMAIL PROTECTED]]
> Sent: 10 April 2012 20:15
> To: [EMAIL PROTECTED]
> Subject: Re: Lag function in Hive
>
> New here.  Hello all.
>
> Could you try a self-join, possibly also restricted to partitions?
>
> E.g. SELECT t2.value - t1.value FROM mytable t1, mytable t2 WHERE
> t1.rownum = t2.rownum+1 AND t1.partition=foo AND t2.partition=bar
>
> If your data is clustered by rownum, then this join should, in theory, be
> relatively fast -- especially if it makes sense to exploit partitions.
>
> -d
>
> On Apr 10, 2012, at 10:37 AM, <[EMAIL PROTECTED]> <
> [EMAIL PROTECTED]> wrote:
>
> > Makes sense but is not the distribution across nodes for a chunk of
> records in that order.
> >
> > If Hive cannot help me do this, is there another way I can do this? I
> tried generating an identifier using the perl script invoked using Hive but
> it does not seem to work fine. While the stand alone script works fine,
> when the record is created in hive using std output from perl - I see 2
> records for some of the unique identifiers. I explored the possibility of
> default data type changes but that does not solve the problem.
> >
> > Regards,
> > Karan
> >
> >
> > -----Original Message-----
> > From: Philip Tromans [mailto:[EMAIL PROTECTED]]
> > Sent: 10 April 2012 19:48
> > To: [EMAIL PROTECTED]
> > Subject: Re: Lag function in Hive
> >
> > Hi Karan,
> >
> > To the best of my knowledge, there isn't one. It's also unlikely to
> > happen because it's hard to parallelise in a map-reduce way (it
> > requires knowing where you are in a result set, and who your
> > neighbours are and they in turn need to be present on the same node as
> > you which is difficult to guarantee).
> >
> > Cheers,
> >
> > Phil.
> >
> > On 10 April 2012 14:44,  <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >>
> >> Is there something like a 'lag' function in HIVE? The requirement is to
> >> calculate difference for the same column for every 2 subsequent records.
> >>
> >> For example.
> >>
> >> Row, Column A, Column B
> >> 1, 10, 100
> >> 2, 20, 200
> >> 3, 30, 300
> >>
> >>
> >> The result that I need should be like:
> >>
> >> Row, Column A, Column B, Result
> >> 1, 10, 100, NULL
> >> 2, 20, 200, 100 (200-100)
> >> 3, 30, 300, 100 (300-200)
> >>
> >> Rgds,
> >> Karan
> >>
> >>
> >>
> >>
> >>
> >> This e-mail and any attachments are confidential and intended solely
> for the
> >> addressee and may also be privileged or exempt from disclosure under
> >> applicable law. If you are not the addressee, or have received this
> e-mail
> >> in error, please notify the sender immediately, delete it from your
> system
> >> and do not copy, disclose or otherwise act upon any part of this e-mail
> or
> >> its attachments.
> >>
> >> Internet communications are not guaranteed to be secure or virus-free.
> >> The Barclays Group does not accept responsibility for any loss arising
> from
> >> unauthorised access to, or interference with, any Internet
> communications by
> >> any third party, or from the transmission of any viruses. Replies to
> this
> >> e-mail may be monitored by the Barclays Group for operational or
> business
> >> reasons.
> >>
> >> Any opinion or other information in this e-mail or its attachments that
> does
> >> not relate to the business of the Barclays Group is personal to the
> sender
> >> and is not given or endorsed by the Barclays Group.
> >>
> >> Barclays Bank PLC.Registered in England and Wales (registered no.
> 1026167).
> >> Registered Office: 1 Churchill Place, London, E14 5HP, United Kingdom.
> >>
> >> Barclays Bank PLC is authorised and regulated by the Financial Services
Nitin Pawar
+
karanveer.singh@... 2012-04-11, 08:23
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB