Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Branch for HIVE-4160


Copy link to this message
-
Re: Branch for HIVE-4160
Jitendra Pandey 2013-04-03, 21:10
  We did consider implementing these changes on the trunk. But, it would
take several patches in various parts of the code before a simple end to
end query can be executed on vectorized path. For example a patch for
vectorized expressions  will be a significant amount of code, but will not
be used in a query until a vectorized operator is implemented and the query
plan is modified to use the vectorized path. Vectorization of even basic
expressions becomes non trivial because we need to optimize for various
cases like chain of expressions, for non-null columns or repeating values
and also handle case for nullable columns, or short circuit optimization
etc. Careful handling of these is important for performance gains.

 Committing those intermediate patches in trunk  without stabilizing them
in a branch first might be a cause of concern.

  A separate branch will let us make incremental changes to the system so
that each patch addresses a single feature or functionality and is small
enough to review.
   We will make sure that the branch is frequently updated with the changes
in the trunk to avoid conflicts at the time of the merge.
  Also, we plan to propose merger of the branch as soon as a basic end to
end query begins to work and is sufficiently tested, instead of waiting for
all operators to get vectorized. Initially our target is to make select and
filter operators work with vectorized expressions for primitive types.

   We will have a single global configuration flag that can be used to turn
off the entire vectorization code path and we will specifically test to
make sure that when this flag is off there is no regression on the current
system. When vectorization is turned on, we will have a validation step to
make sure the given query is supported on the vectorization path otherwise
it will fall back to current code path.

  Although, we intend to follow commit then review policy on the branch for
speed of development, each patch will have an associated jira and will be
available for review and feedback.

thanks
jitendra

On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <[EMAIL PROTECTED]> wrote:

> It will be difficult to merge back the branch.
> Can you stage your changes incrementally ?
>
> I mean, start with the making the operators vectorized - it can be a for
> loop to
> start with ? I think it will be very difficult to merge it back if we
> diverge on this.
> I would recommend starting with simple interfaces for operators and then
> plugging them
> in slowly instead of a new branch, unless this approach is extremely
> difficult.
>
>
> Thanks,
> -namit
>
> On 4/3/13 1:52 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
>
> >Hi Folks,
> >     I want to propose for creation of a separate branch for HIVE-4160
> >work. This is a significant amount of work, and support for very basic
> >functionality will need big chunks of code. It will also take some time to
> >stabilize and test. A separate dev branch will allow us to do this work
> >incrementally and collaboratively. We have already uploaded a design
> >document on the jira for comments/feedback.
> >
> >thanks
> >jitendra
> >
> >
> >--
> ><http://hortonworks.com/download/>
>
>
--
<http://hortonworks.com/download/>