Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Help with Join involving Non-Equality condition


Copy link to this message
-
Re: Help with Join involving Non-Equality condition
Navis류승우 2012-08-16, 23:27
If you don't specify join condition, hive performs cross join.

What is added to hive 0.10.0 is just a clarifying grammar.

2012/8/17 Himanish Kushary <[EMAIL PROTECTED]>

> We are on Hive 0.8 , I think cross join is available only since 0.10.0
>
> Do we have any other options ?
>
> On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji <[EMAIL PROTECTED]> wrote:
> > You can do a CROSS JOIN, then filter with the original inequality join
> > condition.
> > This would generate a lot of redundant tuples and may not work if you
> have
> > large amounts of data.
> >
> > On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary <[EMAIL PROTECTED]>
> > wrote:
> >>
> >> Hi,
> >>
> >> We have two tables in the following structure :
> >>
> >> Table1 :
> >>
> >> |  id   |        packcreatetime         |   packid |
> >> ----------------------------------------------------------------------
> >> | 505  |        2012-07-16 11:51:12     | 111024   |
> >> | 505  |        2012-07-18 11:52:13     | 111025   |
> >> | 505  |        2012-07-19 11:53:14   | 111026   |
> >> | 504  |      2012-07-17  23:50:13  |  101020  |
> >> ------------------------------------------------------------------------
> >>
> >> Table-2
> >>
> >> | id   |   requesttime
> >> ----------------------------------------
> >> | 505 | 2012-07-18 12:09:47
> >> | 505 | 2012-07-19 12:09:59
> >> | 505 | 2012-07-19 12:09:56
> >> | 505 | 2012-07-17 12:06:40
> >> | 505 | 2012-07-17 12:06:40
> >> | 505 | 2012-07-17 12:09:15
> >> | 504 | 2012-07-18 00:03:18
> >> | 504 | 2012-07-18 00:15:41
> >>
> >> We want to find out the packid from Table1 where the  is corresponding
> >> in Table2 and the requesttime(in Table2) is between the
> >> packcreatetime of two relevant records(in Table1)
> >>
> >> So for the above example the final output will be:
> >>
> >> | id   |   requesttime            |   packid
> >> -------------------------------------------------------
> >> | 505 | 2012-07-18 12:09:47 |  111025
> >> | 505 | 2012-07-19 12:09:59 |  111026
> >> | 505 | 2012-07-19 12:09:56 |  111026
> >> | 505 | 2012-07-17 12:06:40 |  111024
> >> | 505 | 2012-07-17 12:06:40 |  111024
> >> | 505 | 2012-07-17 12:09:15 |  111024
> >> | 504 | 2012-07-18 00:03:18 |  101020
> >> | 504 | 2012-07-18 00:15:41 |  101020
> >>
> >>
> >> As we cannot use >= , <= in Hive joins the between logic cannot be
> >> implemented in joins, is there any way to accomplish this or do we
> >> need to write custom M/R code for this.Looking forward for any
> >> suggestions to accomplish this.
> >>
> >> --
> >> Thanks & Regards
> >> Himanish
> >
> >
>
>
>
> --
> Thanks & Regards
> Himanish
>