Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Branch for HIVE-4160


Copy link to this message
-
Re: Branch for HIVE-4160
There is no right answer, but I feel if you go this path a long way, it
will be very difficult
to merge back. Given that this is not a new functionality, and improvement
to existing code
(which will also evolve), it will become difficult to maintain/review a
big diff in the future.

I haven't thought much about it, but can start by creating the high-level
interfaces first, and then
going from there. For e.g.: create interfaces for operators which take in
an array of rows instead of
a single row - initially the array size can always be 1. Now, proceed from
there.

What makes you think, merging a branch 6 months/1 year from now will be
easier than working on the
current branch ?

Having said that, both approaches can be made to work - but I think you
are just delaying the
merging work instead of taking the hit upfront.

Thanks,
-namit

On 4/4/13 2:40 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:

>   We did consider implementing these changes on the trunk. But, it would
>take several patches in various parts of the code before a simple end to
>end query can be executed on vectorized path. For example a patch for
>vectorized expressions  will be a significant amount of code, but will not
>be used in a query until a vectorized operator is implemented and the
>query
>plan is modified to use the vectorized path. Vectorization of even basic
>expressions becomes non trivial because we need to optimize for various
>cases like chain of expressions, for non-null columns or repeating values
>and also handle case for nullable columns, or short circuit optimization
>etc. Careful handling of these is important for performance gains.
>
> Committing those intermediate patches in trunk  without stabilizing them
>in a branch first might be a cause of concern.
>
>  A separate branch will let us make incremental changes to the system so
>that each patch addresses a single feature or functionality and is small
>enough to review.
>   We will make sure that the branch is frequently updated with the
>changes
>in the trunk to avoid conflicts at the time of the merge.
>  Also, we plan to propose merger of the branch as soon as a basic end to
>end query begins to work and is sufficiently tested, instead of waiting
>for
>all operators to get vectorized. Initially our target is to make select
>and
>filter operators work with vectorized expressions for primitive types.
>
>   We will have a single global configuration flag that can be used to
>turn
>off the entire vectorization code path and we will specifically test to
>make sure that when this flag is off there is no regression on the current
>system. When vectorization is turned on, we will have a validation step to
>make sure the given query is supported on the vectorization path otherwise
>it will fall back to current code path.
>
>  Although, we intend to follow commit then review policy on the branch
>for
>speed of development, each patch will have an associated jira and will be
>available for review and feedback.
>
>thanks
>jitendra
>
>On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
>
>> It will be difficult to merge back the branch.
>> Can you stage your changes incrementally ?
>>
>> I mean, start with the making the operators vectorized - it can be a for
>> loop to
>> start with ? I think it will be very difficult to merge it back if we
>> diverge on this.
>> I would recommend starting with simple interfaces for operators and then
>> plugging them
>> in slowly instead of a new branch, unless this approach is extremely
>> difficult.
>>
>>
>> Thanks,
>> -namit
>>
>> On 4/3/13 1:52 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
>>
>> >Hi Folks,
>> >     I want to propose for creation of a separate branch for HIVE-4160
>> >work. This is a significant amount of work, and support for very basic
>> >functionality will need big chunks of code. It will also take some
>>time to
>> >stabilize and test. A separate dev branch will allow us to do this work
>> >incrementally and collaboratively. We have already uploaded a design
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB