Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Help with Join involving Non-Equality condition


+
Himanish Kushary 2012-08-16, 18:07
+
Ablimit Aji 2012-08-16, 18:28
+
Himanish Kushary 2012-08-16, 18:40
+
Navis류승우 2012-08-16, 23:27
+
Bertrand Dechoux 2012-08-17, 05:52
Copy link to this message
-
Re: Help with Join involving Non-Equality condition
You should check out the work being done on. non-equi map joins
http://mail-archives.apache.org/mod_mbox/hive-dev/201206.mbox/%3C1948451998.13482.1339612423225.JavaMail.jiratomcat@issues-vm%3E

https://issues.apache.org/jira/browse/HIVE-3133

On Fri, Aug 17, 2012 at 1:52 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> What are the data volume? And what are the meaning of those data?
>
> From what I can see, you have a 'pack' per day. If that's true, a map join
> could be used because you should not have that many pack creation (But I am
> not sure how to enforce that.)
> I so filtering could happen right after. You would indeed generate lots of
> tuple but they wouldn't be transported over the network nor written to disk.
>
> Even better if you really have (at least) a pack per day then you only need
> to group each request with three pack creation : the day before, the current
> day and the day after.
>
> Regards
>
> Bertrand
>
> On Fri, Aug 17, 2012 at 1:27 AM, Navis류승우 <[EMAIL PROTECTED]> wrote:
>>
>> If you don't specify join condition, hive performs cross join.
>>
>> What is added to hive 0.10.0 is just a clarifying grammar.
>>
>>
>> 2012/8/17 Himanish Kushary <[EMAIL PROTECTED]>
>>>
>>> We are on Hive 0.8 , I think cross join is available only since 0.10.0
>>>
>>> Do we have any other options ?
>>>
>>> On Thu, Aug 16, 2012 at 2:28 PM, Ablimit Aji <[EMAIL PROTECTED]> wrote:
>>> > You can do a CROSS JOIN, then filter with the original inequality join
>>> > condition.
>>> > This would generate a lot of redundant tuples and may not work if you
>>> > have
>>> > large amounts of data.
>>> >
>>> > On Thu, Aug 16, 2012 at 2:07 PM, Himanish Kushary <[EMAIL PROTECTED]>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> We have two tables in the following structure :
>>> >>
>>> >> Table1 :
>>> >>
>>> >> |  id   |        packcreatetime         |   packid |
>>> >> ----------------------------------------------------------------------
>>> >> | 505  |        2012-07-16 11:51:12     | 111024   |
>>> >> | 505  |        2012-07-18 11:52:13     | 111025   |
>>> >> | 505  |        2012-07-19 11:53:14   | 111026   |
>>> >> | 504  |      2012-07-17  23:50:13  |  101020  |
>>> >>
>>> >> ------------------------------------------------------------------------
>>> >>
>>> >> Table-2
>>> >>
>>> >> | id   |   requesttime
>>> >> ----------------------------------------
>>> >> | 505 | 2012-07-18 12:09:47
>>> >> | 505 | 2012-07-19 12:09:59
>>> >> | 505 | 2012-07-19 12:09:56
>>> >> | 505 | 2012-07-17 12:06:40
>>> >> | 505 | 2012-07-17 12:06:40
>>> >> | 505 | 2012-07-17 12:09:15
>>> >> | 504 | 2012-07-18 00:03:18
>>> >> | 504 | 2012-07-18 00:15:41
>>> >>
>>> >> We want to find out the packid from Table1 where the  is corresponding
>>> >> in Table2 and the requesttime(in Table2) is between the
>>> >> packcreatetime of two relevant records(in Table1)
>>> >>
>>> >> So for the above example the final output will be:
>>> >>
>>> >> | id   |   requesttime            |   packid
>>> >> -------------------------------------------------------
>>> >> | 505 | 2012-07-18 12:09:47 |  111025
>>> >> | 505 | 2012-07-19 12:09:59 |  111026
>>> >> | 505 | 2012-07-19 12:09:56 |  111026
>>> >> | 505 | 2012-07-17 12:06:40 |  111024
>>> >> | 505 | 2012-07-17 12:06:40 |  111024
>>> >> | 505 | 2012-07-17 12:09:15 |  111024
>>> >> | 504 | 2012-07-18 00:03:18 |  101020
>>> >> | 504 | 2012-07-18 00:15:41 |  101020
>>> >>
>>> >>
>>> >> As we cannot use >= , <= in Hive joins the between logic cannot be
>>> >> implemented in joins, is there any way to accomplish this or do we
>>> >> need to write custom M/R code for this.Looking forward for any
>>> >> suggestions to accomplish this.
>>> >>
>>> >> --
>>> >> Thanks & Regards
>>> >> Himanish
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Himanish
>>
>>
>
>
>
> --
> Bertrand Dechoux
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB