Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Easy question...difference between this::form and this.form?


Copy link to this message
-
RE: Easy question...difference between this::form and this.form?
Unambiguous column names can be accessed as is without the :: An example that demonstrates it follows:

grunt> a = load 'a' as (x, y, XX);
grunt> b = load 'b' as (x, y, YY);
grunt> c = load 'c' as (x,y, ZZ);
grunt> d = join a by $0, b by $0;
grunt> describe d;
d: {a::x: bytearray,a::y: bytearray,a::XX: bytearray,b::x: bytearray,b::y: bytearray,b::YY: bytearray}
grunt> e = join d by $0, c by $0;
grunt> describe e;
e: {d::a::x: bytearray,d::a::y: bytearray,d::a::XX: bytearray,d::b::x: bytearray,d::b::y: bytearray,d::b::YY: bytearray,c::x: bytearray,c::y: bytearray,c::ZZ: bytearray}

grunt> f = foreach e generate XX;
-------------------------------------------^^^
grunt> describe f;
f: {d::a::XX: bytearray}

-----Original Message-----
From: Anze [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, December 08, 2010 12:24 AM
To: [EMAIL PROTECTED]
Subject: Re: Easy question...difference between this::form and this.form?
I'm curious - is this a problem for others as well? Do you keep 'A::C::myId'
or do you use FOREACH... GENERATE after each JOIN?

About possible workarounds:
Is it possible to write an UDF that would automatically strip 'X::' from the start of the names? For instance:
C: {A::x, A::y, B::x, B::v}
C = FLATTEN_NAMES(C, 'x');
C: {x, y, v}
('x' is the name of the column on which JOIN was made, if it is the same in A and B) Can sth. like this be done with UDFs?
(I admit it's ugly, but... ;)

Another way would be to add an argument to the JOIN (& co.), telling it to use flat names and to fail with error if the names are ambiguous:
C = JOIN A by x, B by x FLATTEN_NAMES;
C: {x, y, v}

Anze
On Wednesday 08 December 2010, Dmitriy Ryaboy wrote:
> it's sort of true -- but, iirc, only goes one level deep, so once you
> do a second join, you are stuck with "::"s
>
> On Tue, Dec 7, 2010 at 10:11 AM, Santhosh Srinivasan <sms@yahoo-
inc.com>wrote:
> > > The sql way to deal with this issue is essentially to keep the
> > > name of
> >
> > the parent relation
> >
> > > around during parsing, and require that you explicitly provide the
> >
> > desired parent if column
> >
> > > names are ambiguous. That's probably something that could be
> > > implemented
> >
> > now that we have
> >
> > > the required metadata in the operators (I believe it wasn't there
> > > when
> >
> > the disambiguation
> >
> > > design was implemented).
> >
> > Isn't that true today? Unambiguous columns can be referenced without
> > the
> > :: operator.
> >
> > Santhosh
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, December 07, 2010 9:49 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Easy question...difference between this::form and this.form?
> >
> > Consider self-joins, with regards to the meaningful name problem...
> >
> > The sql way to deal with this issue is essentially to keep the name
> > of the parent relation around during parsing, and require that you
> > explicitly provide the desired parent if column names are ambiguous.
> > That's probably something that could be implemented now that we have
> > the required metadata in the operators (I believe it wasn't there
> > when the disambiguation design was implemented).
> >
> > As far as difference between "::" and ".".  The double-colon is just
> > a string with no special meaning, it's simply part of the field
> > name. The period is essentially a projection operator -- you are
> > saying, "the thing to the left of the period is a tuple, and the
> > thing to the right is a field in that tuple". (works for bags as
> > well, in which case it means, the thing to the left of the period is
> > a bag of tuples, and the thing to the right is a field in every
> > tuple in the bag)
> >
> > -Dmitriy.
> >
> > 2010/12/7 Anze <[EMAIL PROTECTED]>
> >
> > > If one uses meaningful names then Pig would never use '::' anyway.
> > > The problem is when you use multiple joins in sequence, then '::'
> > > names get very annoying.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB