Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - non-equality joins


Copy link to this message
-
Re: non-equality joins
buddhika chamith 2012-03-17, 06:02
Hi,

I think matt's solution is the way to go for now. If you need some basic
understanding on how reduce and map side joins work see [1] whether if it
helps you.

Regards
Buddhika

[1] http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/

On Sat, Mar 17, 2012 at 6:41 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> There are algorithms for doing general theta-joins in parallel.  Search
> Google on "theta joins parallel database" and you will find some
> interesting references.  I am not aware of any tools that implement these
> yet.  You can also do it via a cross join followed by a filter, but again
> you need special algorithms to do a cross in MapReduce, which Hive doesn't
> implement yet.  See
> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html(search for the section on Cross) for a discussion of how to do cross in
> MapReduce.
>
> Alan.
>
> On Mar 13, 2012, at 10:13 AM, Tucker, Matt wrote:
>
> > For theta joins, you’ll have to convert the query to an equi-join, and
> then filter for non-equality in the WHERE clause.  Depending upon the size
> of each table, you might consider looking at map-side joins, which will
> allow for doing non-equality filters during a join before it’s passed to
> the reducers.
> >
> > Matt Tucker
> >
> > From: mahsa mofidpoor [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, March 13, 2012 1:02 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: non-equality joins
> >
> >
> > Hi Keith,
> >
> > Do you know exactly how an algorithm should be in order to fit in the
> MapReduce framework? Could you refer me to some references?
> >
> > Thanks and Regards,
> > Mahsa
> >
> >
> >
> > On Tue, Mar 13, 2012 at 12:49 PM, Keith Wiley <[EMAIL PROTECTED]>
> wrote:
> > https://cwiki.apache.org/Hive/languagemanual-joins.html
> >
> > "Hive does not support join conditions that are not equality conditions
> as it is very difficult to express such conditions as a map/reduce job."
> >
> > I admit, that isn't a very detailed answer, but it gives some indication
> of the reason for the discrepancy between Hive and other databases.  Hive
> fundamentally operates on Hadoop, namely on MapReduce (we all know this,
> I'm just reiterating the train of thought).  The problem is that certain
> algorithms are exceedingly difficult to wedge into the MapReduce framework.
> >
> > That is as detailed as my personal insight can get.  I've done a lot of
> MapReduce programming in Hadoop but I'm not a database expert and I don't
> really understand the steps involved in various kinds of table-joins, so I
> don't understand the particular ways in which certain database operations
> do or do not fit into MapReduce...but presumably nonequality joins
> (whatever those are :-D ) are particularly difficult to MapReduceify.
> >
> > Cheers!
> >
> > On Mar 13, 2012, at 09:17 , mahsa mofidpoor wrote:
> >
> > > Hello,
> > >
> > > Is there a reason behind not implementing non-equality joins in Hive?
> In other words, is there any usage for theta-join, if implemented?
> > >
> > > Thank you in advance for your response,
> > > Mahsa
> >
> >
> >
> ________________________________________________________________________________
> > Keith Wiley     [EMAIL PROTECTED]     keithwiley.com
> music.keithwiley.com
> >
> > "It's a fine line between meticulous and obsessive-compulsive and a
> slippery
> > rope between obsessive-compulsive and debilitatingly slow."
> >                                           --  Keith Wiley
> >
> ________________________________________________________________________________
> >
> >
>
>