Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # dev >> Does the name of the tuple that a bag has to have matter?

Jonathan Coveney 2011-11-17, 03:17
Alan Gates 2011-11-18, 17:00
Copy link to this message
RE: Does the name of the tuple that a bag has to have matter?
Its an implementation artifact of the old parser JavaCC in release prior to and including 0.8. The new parser, as Alan points out, should not require this.


-----Original Message-----
From: Alan Gates [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 18, 2011 9:00 AM
Subject: Re: Does the name of the tuple that a bag has to have matter?

The name doesn't matter.  We mostly left it there for backward compatibility, for both specifying schemas and for UDFs.  I do think we should make sure we ignore it everywhere (including equality for schemas).


On Nov 16, 2011, at 7:17 PM, Jonathan Coveney wrote:

> This is related to an issue I'll probably be emailing about once I
> isolate it, but I was curious what the philosophy is around the name
> of the tuple that is in a bag.
> example:
> Schema s1 > Utils.getSchemaFromString("b:bag{t:tuple(name:chararray,age:int)}");
> In pig8, you had the whole two level access nonsense, so let's ignore that.
> In pig9, the tuple name seemed to be preserved, and would print with
> toString.
> In trunk, the schema object throws away that name, and it doesn't print.
> I'm curious if there is any reason to keep it around, esp. given you
> can just do Schema.equals(s1,s2,false,true) for equality without field
> names, not to mention the fact that the name never really is going to
> matter since a bag only has one element and it is a tuple.
> Thanks!
> Jon
Jonathan Coveney 2011-11-21, 07:43