Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Branch for HIVE-4160


Copy link to this message
-
Re: Branch for HIVE-4160
Namit Jain 2013-04-04, 04:03
There is no right answer, but I feel if you go this path a long way, it
will be very difficult
to merge back. Given that this is not a new functionality, and improvement
to existing code
(which will also evolve), it will become difficult to maintain/review a
big diff in the future.

I haven't thought much about it, but can start by creating the high-level
interfaces first, and then
going from there. For e.g.: create interfaces for operators which take in
an array of rows instead of
a single row - initially the array size can always be 1. Now, proceed from
there.

What makes you think, merging a branch 6 months/1 year from now will be
easier than working on the
current branch ?

Having said that, both approaches can be made to work - but I think you
are just delaying the
merging work instead of taking the hit upfront.

Thanks,
-namit

On 4/4/13 2:40 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:

>   We did consider implementing these changes on the trunk. But, it would
>take several patches in various parts of the code before a simple end to
>end query can be executed on vectorized path. For example a patch for
>vectorized expressions  will be a significant amount of code, but will not
>be used in a query until a vectorized operator is implemented and the
>query
>plan is modified to use the vectorized path. Vectorization of even basic
>expressions becomes non trivial because we need to optimize for various
>cases like chain of expressions, for non-null columns or repeating values
>and also handle case for nullable columns, or short circuit optimization
>etc. Careful handling of these is important for performance gains.
>
> Committing those intermediate patches in trunk  without stabilizing them
>in a branch first might be a cause of concern.
>
>  A separate branch will let us make incremental changes to the system so
>that each patch addresses a single feature or functionality and is small
>enough to review.
>   We will make sure that the branch is frequently updated with the
>changes
>in the trunk to avoid conflicts at the time of the merge.
>  Also, we plan to propose merger of the branch as soon as a basic end to
>end query begins to work and is sufficiently tested, instead of waiting
>for
>all operators to get vectorized. Initially our target is to make select
>and
>filter operators work with vectorized expressions for primitive types.
>
>   We will have a single global configuration flag that can be used to
>turn
>off the entire vectorization code path and we will specifically test to
>make sure that when this flag is off there is no regression on the current
>system. When vectorization is turned on, we will have a validation step to
>make sure the given query is supported on the vectorization path otherwise
>it will fall back to current code path.
>
>  Although, we intend to follow commit then review policy on the branch
>for
>speed of development, each patch will have an associated jira and will be
>available for review and feedback.
>
>thanks
>jitendra
>
>On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
>
>> It will be difficult to merge back the branch.
>> Can you stage your changes incrementally ?
>>
>> I mean, start with the making the operators vectorized - it can be a for
>> loop to
>> start with ? I think it will be very difficult to merge it back if we
>> diverge on this.
>> I would recommend starting with simple interfaces for operators and then
>> plugging them
>> in slowly instead of a new branch, unless this approach is extremely
>> difficult.
>>
>>
>> Thanks,
>> -namit
>>
>> On 4/3/13 1:52 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
>>
>> >Hi Folks,
>> >     I want to propose for creation of a separate branch for HIVE-4160
>> >work. This is a significant amount of work, and support for very basic
>> >functionality will need big chunks of code. It will also take some
>>time to
>> >stabilize and test. A separate dev branch will allow us to do this work
>> >incrementally and collaboratively. We have already uploaded a design