Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Easy question...difference between this::form and this.form?


Copy link to this message
-
RE: Easy question...difference between this::form and this.form?
Santhosh Srinivasan 2010-12-08, 20:02
Unambiguous column names can be accessed as is without the :: An example that demonstrates it follows:

grunt> a = load 'a' as (x, y, XX);
grunt> b = load 'b' as (x, y, YY);
grunt> c = load 'c' as (x,y, ZZ);
grunt> d = join a by $0, b by $0;
grunt> describe d;
d: {a::x: bytearray,a::y: bytearray,a::XX: bytearray,b::x: bytearray,b::y: bytearray,b::YY: bytearray}
grunt> e = join d by $0, c by $0;
grunt> describe e;
e: {d::a::x: bytearray,d::a::y: bytearray,d::a::XX: bytearray,d::b::x: bytearray,d::b::y: bytearray,d::b::YY: bytearray,c::x: bytearray,c::y: bytearray,c::ZZ: bytearray}

grunt> f = foreach e generate XX;
-------------------------------------------^^^
grunt> describe f;
f: {d::a::XX: bytearray}

-----Original Message-----
From: Anze [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 08, 2010 12:24 AM
To: [EMAIL PROTECTED]
Subject: Re: Easy question...difference between this::form and this.form?
I'm curious - is this a problem for others as well? Do you keep 'A::C::myId'
or do you use FOREACH... GENERATE after each JOIN?

About possible workarounds:
Is it possible to write an UDF that would automatically strip 'X::' from the start of the names? For instance:
C: {A::x, A::y, B::x, B::v}
C = FLATTEN_NAMES(C, 'x');
C: {x, y, v}
('x' is the name of the column on which JOIN was made, if it is the same in A and B) Can sth. like this be done with UDFs?
(I admit it's ugly, but... ;)

Another way would be to add an argument to the JOIN (& co.), telling it to use flat names and to fail with error if the names are ambiguous:
C = JOIN A by x, B by x FLATTEN_NAMES;
C: {x, y, v}

Anze
On Wednesday 08 December 2010, Dmitriy Ryaboy wrote:
> it's sort of true -- but, iirc, only goes one level deep, so once you
> do a second join, you are stuck with "::"s
>
> On Tue, Dec 7, 2010 at 10:11 AM, Santhosh Srinivasan <sms@yahoo-
inc.com>wrote:
> > > The sql way to deal with this issue is essentially to keep the
> > > name of
> >
> > the parent relation
> >
> > > around during parsing, and require that you explicitly provide the
> >
> > desired parent if column
> >
> > > names are ambiguous. That's probably something that could be
> > > implemented
> >
> > now that we have
> >
> > > the required metadata in the operators (I believe it wasn't there
> > > when
> >
> > the disambiguation
> >
> > > design was implemented).
> >
> > Isn't that true today? Unambiguous columns can be referenced without
> > the
> > :: operator.
> >
> > Santhosh
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, December 07, 2010 9:49 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Easy question...difference between this::form and this.form?
> >
> > Consider self-joins, with regards to the meaningful name problem...
> >
> > The sql way to deal with this issue is essentially to keep the name
> > of the parent relation around during parsing, and require that you
> > explicitly provide the desired parent if column names are ambiguous.
> > That's probably something that could be implemented now that we have
> > the required metadata in the operators (I believe it wasn't there
> > when the disambiguation design was implemented).
> >
> > As far as difference between "::" and ".".  The double-colon is just
> > a string with no special meaning, it's simply part of the field
> > name. The period is essentially a projection operator -- you are
> > saying, "the thing to the left of the period is a tuple, and the
> > thing to the right is a field in that tuple". (works for bags as
> > well, in which case it means, the thing to the left of the period is
> > a bag of tuples, and the thing to the right is a field in every
> > tuple in the bag)
> >
> > -Dmitriy.
> >
> > 2010/12/7 Anze <[EMAIL PROTECTED]>
> >
> > > If one uses meaningful names then Pig would never use '::' anyway.
> > > The problem is when you use multiple joins in sequence, then '::'
> > > names get very annoying.