Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> logical plan design coming together


Copy link to this message
-
Re: logical plan design coming together
For those implementing parsing & validation of the query language. Please let me share my hard-earned wisdom...

1. Separate parsing and validation. The parser should do the absolute minimum of validation. Don't try to validate identifiers. Don't do any type-checking. It will make errors better ('This function needs a boolean parameter' versus 'Expecting "true" or "false" or "<token> and" or 101 other possibilities'.) And allows the parser to stay focused on one task which is difficult enough: converting text into a parse tree.

2. During the validation phase, do not modify the parse tree. If you need to annotate each node with a type, put it into a map from parse tree node -> type, not into a field in each node. Put any state you need (e.g. scope for resolving identifiers) into a temporary state that exists only during validation (think of the visitor pattern). And definitely do not do any tree-surgery. If you need to rewrite the tree, do it post validation. (In the planner, or just before planning, is a good time.) See http://en.wikipedia.org/wiki/Immutable_object.

Julian

On Oct 12, 2012, at 10:34 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> Great comments.
>
> One particular high-level comment that Julian made is a criticism that I
> have made in the past of other projects.  It is probably good for my
> character to be on the receiving side of this criticism for once.
>
> The question is why should we use/invent a new concrete syntax when JSON
> would do just as well (I am dropping the XML part of the suggestion due to
> known prejudices on this list).
>
> I don't have a good answer to this question.  It makes certain problems
> quite a bit easier.  Moreover, I have said in the past that it is nuts to
> re-invent concrete syntax for config files and extension languages like
> this.
>
> My course going forward is that I think I will put down both syntaxes and
> let folks form their own opinion.  Using JSON will definitely move things
> ahead more quickly since other folks have done the parser for us.
>
> On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <[EMAIL PROTECTED]> wrote:
>
>> Ted,
>>
>> Great start. I've made some comments on the doc.
>>
>> Julian
>>
>> On Oct 11, 2012, at 10:48 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>
>>> The design for the logical plan is coming together.  Anybody should be
>> able
>>> to get to the interim design document at
>>>
>>>
>> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
>>>
>>> You should also be able to see the discussion so far.  Many thanks to
>>> Timothy Chen for kibitzing very well as I wrote.  His astute observations
>>> and questions were critical.
>>>
>>> I have to go sleep now, but it would be great to see progress on this
>> while
>>> I sleep.  Remember that comments and questions are as valuable (or more
>> so)
>>> than text.  Remember also, this document has a complete history so we can
>>> reconstruct it no matter what happens.
>>>
>>> I would particularly like eyes on this (if practical) from Camuel, Jason,
>>> Gera and Julian Hyde.  They have had some very good thoughts about this
>>> layer in the past and probably will spot several errors in what I have
>>> written.
>>>
>>> The plan for this document as it stabilizes is to put it into the
>> web-site
>>> under the documentation area.  WE will probably want to do that before it
>>> really is done to make sure that people can find it easily and to ensure
>> a
>>> checkpoint is in Apache-land.
>>>
>>> See y'all tomorrow.
>>
>>