Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> logical plan design coming together


Copy link to this message
-
Re: logical plan design coming together
For those implementing parsing & validation of the query language. Please let me share my hard-earned wisdom...

1. Separate parsing and validation. The parser should do the absolute minimum of validation. Don't try to validate identifiers. Don't do any type-checking. It will make errors better ('This function needs a boolean parameter' versus 'Expecting "true" or "false" or "<token> and" or 101 other possibilities'.) And allows the parser to stay focused on one task which is difficult enough: converting text into a parse tree.

2. During the validation phase, do not modify the parse tree. If you need to annotate each node with a type, put it into a map from parse tree node -> type, not into a field in each node. Put any state you need (e.g. scope for resolving identifiers) into a temporary state that exists only during validation (think of the visitor pattern). And definitely do not do any tree-surgery. If you need to rewrite the tree, do it post validation. (In the planner, or just before planning, is a good time.) See http://en.wikipedia.org/wiki/Immutable_object.

Julian

On Oct 12, 2012, at 10:34 AM, Ted Dunning <[EMAIL PROTECTED]> wrote:

> Great comments.
>
> One particular high-level comment that Julian made is a criticism that I
> have made in the past of other projects.  It is probably good for my
> character to be on the receiving side of this criticism for once.
>
> The question is why should we use/invent a new concrete syntax when JSON
> would do just as well (I am dropping the XML part of the suggestion due to
> known prejudices on this list).
>
> I don't have a good answer to this question.  It makes certain problems
> quite a bit easier.  Moreover, I have said in the past that it is nuts to
> re-invent concrete syntax for config files and extension languages like
> this.
>
> My course going forward is that I think I will put down both syntaxes and
> let folks form their own opinion.  Using JSON will definitely move things
> ahead more quickly since other folks have done the parser for us.
>
> On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <[EMAIL PROTECTED]> wrote:
>
>> Ted,
>>
>> Great start. I've made some comments on the doc.
>>
>> Julian
>>
>> On Oct 11, 2012, at 10:48 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
>>
>>> The design for the logical plan is coming together.  Anybody should be
>> able
>>> to get to the interim design document at
>>>
>>>
>> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
>>>
>>> You should also be able to see the discussion so far.  Many thanks to
>>> Timothy Chen for kibitzing very well as I wrote.  His astute observations
>>> and questions were critical.
>>>
>>> I have to go sleep now, but it would be great to see progress on this
>> while
>>> I sleep.  Remember that comments and questions are as valuable (or more
>> so)
>>> than text.  Remember also, this document has a complete history so we can
>>> reconstruct it no matter what happens.
>>>
>>> I would particularly like eyes on this (if practical) from Camuel, Jason,
>>> Gera and Julian Hyde.  They have had some very good thoughts about this
>>> layer in the past and probably will spot several errors in what I have
>>> written.
>>>
>>> The plan for this document as it stabilizes is to put it into the
>> web-site
>>> under the documentation area.  WE will probably want to do that before it
>>> really is done to make sure that people can find it easily and to ensure
>> a
>>> checkpoint is in Apache-land.
>>>
>>> See y'all tomorrow.
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB