Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> field name reference - alias


Copy link to this message
-
Re: field name reference - alias
This is expected behavior. The disambiguation comes only after two or more
relations are brought together.

As per the docs at
http://pig.apache.org/docs/r0.11.1/basic.html#disambiguate, the
disambiguate operator can only be used to identify field names after JOIN,
COGROUP, CROSS, or FLATTEN operators.

The difference between the first and third example is that in your first
example, you have a JOIN operator. You would get a syntax error if you
tried to say

    C = JOIN A by A::x LEFT OUTER, B by a;

There are no fields named 'A::x' in A. However, in C, you have a field
named 'A::x'. You can refer to this field by 'x' (Because no other field is
also named 'x') or by 'A::x'.

Hope that helps.
On Thu, Aug 8, 2013 at 9:59 PM, Keren Ouaknine <[EMAIL PROTECTED]> wrote:

> Hello,
>
> Can one refer to a field name with no ambiguity by its full name (A::x
> instead of x)? Below are two contradictory behaviors:
> *
> *
> *First example:*
> A = load '1.txt'      using PigStorage(' ')  as (x:int, y:chararray,
> z:chararray);
> B = load '1_ext.txt'  using PigStorage(' ')  as (a:int, b:chararray,
> c:chararray);
> C = JOIN A by x LEFT OUTER, B BY a;
> D = FOREACH C GENERATE A::x as toto;
> describe C;
> describe D;
>
> *output:*
> C: {A::x: int,A::y: chararray,A::z: chararray,B::a: int,B::b:
> chararray,B::c: chararray}
> D: {toto: int}
>
> Works fine also if you refer to A:: x as x.
>
> *Second example with toMap:*
> A = load '1.txt'  using PigStorage(' ')  as (x:int, y:chararray,
> z:chararray);
> B = FOREACH A GENERATE TOMAP('toto', x);
> describe B;
> DUMP B;
> store B into '/home/kereno/Documents/pig-0.11.1/workspace/res';
>
> *output:*
> C: {map[]}
>
> If you change the script to refer to A::x, you would get an error as
> follow:
> A = load '1.txt'  using PigStorage(' ')  as (x:int, y:chararray,
> z:chararray);
> B = FOREACH A GENERATE TOMAP('toto', A::x);
> describe B;
> DUMP B;
> store B into '/home/kereno/Documents/pig-0.11.1/workspace/res';
>
> output
> <file tomap.pig, line 2, column 37> Invalid field projection. Projected
> field [A::x] does not exist in schema: x:int,y:chararray,z:chararray.
>
> My question is why is it that for the FOREACH I can use either and not for
> the TOMAP??
> side node: I am asking cause I am generating schemas of a Pig script and
> use these as input for another language (project translating Pig to
> Algebricks) and would like to be consistent with the Pig behavior :).
>
> Thanks,
> Keren
>
> --
> Keren Ouaknine
> Web: www.kereno.com
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB