Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Synthetic keys


Copy link to this message
-
Re: Synthetic keys
Sergey Goder 2013-05-24, 16:51
One reason you might prefer to use the JOIN a BY 1, b BY 1 syntax is to
specify a type of join such as the replicated join which will increase the
performance.
On Fri, May 24, 2013 at 9:15 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> You can do this, but pig has a CROSS keyword that you can use.
>
>
> 2013/5/23 Mehmet Tepedelenlioglu <[EMAIL PROTECTED]>
>
> > Hi,
> >
> > I am using this:
> >
> > x = join a by 1, b by 1  using 'replicated';
> >
> > with the hope that it generates some synthetic key '1' on both a and b
> and
> > joins it on that key, thereby, in this case, doing a clean map side cross
> > of
> > a and b with no schema changes (exactly the way a cross would work). It
> > seems to be working, but since I just tried it and it worked, I am not
> sure
> > if there is anything in there I should be aware of. Does anyone know?
> >
> > Thanks,
> >
> > Mehmet
> >
> >
> >
>