Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - What is the canonicalname field in a Schema object used for?


+
Jonathan Coveney 2011-11-17, 03:25
+
Alan Gates 2011-11-18, 16:58
Copy link to this message
-
RE: What is the canonicalname field in a Schema object used for?
Santhosh Srinivasan 2011-11-21, 07:17
Alan is right. Its meant to help with disambiguation when the column name is the same across relations. In Alan's example, if you had u instead of x in B, then the columns in the C (join) would be (A::u, v, B::u, y). A::v and B::y are also valid column names.

Santhosh

-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 18, 2011 8:58 AM
To: [EMAIL PROTECTED]
Cc: Santhosh Srinivasan
Subject: Re: What is the canonicalname field in a Schema object used for?

Santosh is the best person to answer this, as he wrote that code.  But, IIRC its purpose is to store the "full" name of a column after cogroups and joins.  For example,

A = load 'foo' as (u, v);
B = load 'bar' as (x, y);
C = join A by u, B by x;

I believe the canonicalname will now hold A::u, etc.

Alan.

On Nov 16, 2011, at 7:25 PM, Jonathan Coveney wrote:

> If you do:
>
> Schema s1 = Utils.getSchemaFromString(
> "b:bag{t:tuple(name:chararray,age:int)}");
>
>
> then it will all be -1'd out. It doesn't seem to be used anywhere, I
> was just wondering, since in other case, it will be populated properly.
>
>
> Thanks
>
> Jon
+
Jonathan Coveney 2011-11-21, 07:41