Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Does the name of the tuple that a bag has to have matter?


Copy link to this message
-
RE: Does the name of the tuple that a bag has to have matter?
Santhosh Srinivasan 2011-11-21, 07:19
Its an implementation artifact of the old parser JavaCC in release prior to and including 0.8. The new parser, as Alan points out, should not require this.

Santhosh

-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 18, 2011 9:00 AM
To: [EMAIL PROTECTED]
Subject: Re: Does the name of the tuple that a bag has to have matter?

The name doesn't matter.  We mostly left it there for backward compatibility, for both specifying schemas and for UDFs.  I do think we should make sure we ignore it everywhere (including equality for schemas).

Alan.

On Nov 16, 2011, at 7:17 PM, Jonathan Coveney wrote:

> This is related to an issue I'll probably be emailing about once I
> isolate it, but I was curious what the philosophy is around the name
> of the tuple that is in a bag.
>
> example:
> Schema s1 > Utils.getSchemaFromString("b:bag{t:tuple(name:chararray,age:int)}");
>
> In pig8, you had the whole two level access nonsense, so let's ignore that.
> In pig9, the tuple name seemed to be preserved, and would print with
> toString.
> In trunk, the schema object throws away that name, and it doesn't print.
>
> I'm curious if there is any reason to keep it around, esp. given you
> can just do Schema.equals(s1,s2,false,true) for equality without field
> names, not to mention the fact that the name never really is going to
> matter since a bag only has one element and it is a tuple.
>
> Thanks!
> Jon