I agree that we shouldn't wait too long before merging the branch.
We are targeting to have basic queries working within a month from now and
will definitely propose to merge the branch back into trunk at that point.
We will limit the scope of the work on the branch to just a few operators
and primitive datatypes. Does that sound reasonable?
On Wed, Apr 3, 2013 at 9:03 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
> There is no right answer, but I feel if you go this path a long way, it
> will be very difficult
> to merge back. Given that this is not a new functionality, and improvement
> to existing code
> (which will also evolve), it will become difficult to maintain/review a
> big diff in the future.
> I haven't thought much about it, but can start by creating the high-level
> interfaces first, and then
> going from there. For e.g.: create interfaces for operators which take in
> an array of rows instead of
> a single row - initially the array size can always be 1. Now, proceed from
> What makes you think, merging a branch 6 months/1 year from now will be
> easier than working on the
> current branch ?
> Having said that, both approaches can be made to work - but I think you
> are just delaying the
> merging work instead of taking the hit upfront.
> On 4/4/13 2:40 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
> > We did consider implementing these changes on the trunk. But, it would
> >take several patches in various parts of the code before a simple end to
> >end query can be executed on vectorized path. For example a patch for
> >vectorized expressions will be a significant amount of code, but will not
> >be used in a query until a vectorized operator is implemented and the
> >plan is modified to use the vectorized path. Vectorization of even basic
> >expressions becomes non trivial because we need to optimize for various
> >cases like chain of expressions, for non-null columns or repeating values
> >and also handle case for nullable columns, or short circuit optimization
> >etc. Careful handling of these is important for performance gains.
> > Committing those intermediate patches in trunk without stabilizing them
> >in a branch first might be a cause of concern.
> > A separate branch will let us make incremental changes to the system so
> >that each patch addresses a single feature or functionality and is small
> >enough to review.
> > We will make sure that the branch is frequently updated with the
> >in the trunk to avoid conflicts at the time of the merge.
> > Also, we plan to propose merger of the branch as soon as a basic end to
> >end query begins to work and is sufficiently tested, instead of waiting
> >all operators to get vectorized. Initially our target is to make select
> >filter operators work with vectorized expressions for primitive types.
> > We will have a single global configuration flag that can be used to
> >off the entire vectorization code path and we will specifically test to
> >make sure that when this flag is off there is no regression on the current
> >system. When vectorization is turned on, we will have a validation step to
> >make sure the given query is supported on the vectorization path otherwise
> >it will fall back to current code path.
> > Although, we intend to follow commit then review policy on the branch
> >speed of development, each patch will have an associated jira and will be
> >available for review and feedback.
> >On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
> >> It will be difficult to merge back the branch.
> >> Can you stage your changes incrementally ?
> >> I mean, start with the making the operators vectorized - it can be a for
> >> loop to
> >> start with ? I think it will be very difficult to merge it back if we
> >> diverge on this.
> >> I would recommend starting with simple interfaces for operators and then