Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Help with Join involving Non-Equality condition


+
Himanish Kushary 2012-08-16, 18:07
+
Ablimit Aji 2012-08-16, 18:28
+
Himanish Kushary 2012-08-16, 18:40
Copy link to this message
-
Re: Help with Join involving Non-Equality condition
If you don't specify join condition, hive performs cross join.

What is added to hive 0.10.0 is just a clarifying grammar.

2012/8/17 Himanish Kushary <[EMAIL PROTECTED]>

> We are on Hive 0.8 , I think cross join is available only since 0.10.0
>
> Do we have any other options ?
>
> On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji <[EMAIL PROTECTED]> wrote:
> > You can do a CROSS JOIN, then filter with the original inequality join
> > condition.
> > This would generate a lot of redundant tuples and may not work if you
> have
> > large amounts of data.
> >
> > On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary <[EMAIL PROTECTED]>
> > wrote:
> >>
> >> Hi,
> >>
> >> We have two tables in the following structure :
> >>
> >> Table1 :
> >>
> >> |  id   |        packcreatetime         |   packid |
> >> ----------------------------------------------------------------------
> >> | 505  |        2012-07-16 11:51:12     | 111024   |
> >> | 505  |        2012-07-18 11:52:13     | 111025   |
> >> | 505  |        2012-07-19 11:53:14   | 111026   |
> >> | 504  |      2012-07-17  23:50:13  |  101020  |
> >> ------------------------------------------------------------------------
> >>
> >> Table-2
> >>
> >> | id   |   requesttime
> >> ----------------------------------------
> >> | 505 | 2012-07-18 12:09:47
> >> | 505 | 2012-07-19 12:09:59
> >> | 505 | 2012-07-19 12:09:56
> >> | 505 | 2012-07-17 12:06:40
> >> | 505 | 2012-07-17 12:06:40
> >> | 505 | 2012-07-17 12:09:15
> >> | 504 | 2012-07-18 00:03:18
> >> | 504 | 2012-07-18 00:15:41
> >>
> >> We want to find out the packid from Table1 where the  is corresponding
> >> in Table2 and the requesttime(in Table2) is between the
> >> packcreatetime of two relevant records(in Table1)
> >>
> >> So for the above example the final output will be:
> >>
> >> | id   |   requesttime            |   packid
> >> -------------------------------------------------------
> >> | 505 | 2012-07-18 12:09:47 |  111025
> >> | 505 | 2012-07-19 12:09:59 |  111026
> >> | 505 | 2012-07-19 12:09:56 |  111026
> >> | 505 | 2012-07-17 12:06:40 |  111024
> >> | 505 | 2012-07-17 12:06:40 |  111024
> >> | 505 | 2012-07-17 12:09:15 |  111024
> >> | 504 | 2012-07-18 00:03:18 |  101020
> >> | 504 | 2012-07-18 00:15:41 |  101020
> >>
> >>
> >> As we cannot use >= , <= in Hive joins the between logic cannot be
> >> implemented in joins, is there any way to accomplish this or do we
> >> need to write custom M/R code for this.Looking forward for any
> >> suggestions to accomplish this.
> >>
> >> --
> >> Thanks & Regards
> >> Himanish
> >
> >
>
>
>
> --
> Thanks & Regards
> Himanish
>
+
Bertrand Dechoux 2012-08-17, 05:52
+
Edward Capriolo 2012-08-17, 13:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB