Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Branch for HIVE-4160


Copy link to this message
-
Re: Branch for HIVE-4160
I agree that we shouldn't wait too long before merging the branch.
We are targeting to have basic queries working within a month from now and
will definitely propose to merge the branch back into trunk at that point.
We will limit the scope of the work on the branch to just a few operators
and primitive datatypes. Does that sound reasonable?

regards
jitendra

On Wed, Apr 3, 2013 at 9:03 PM, Namit Jain <[EMAIL PROTECTED]> wrote:

> There is no right answer, but I feel if you go this path a long way, it
> will be very difficult
> to merge back. Given that this is not a new functionality, and improvement
> to existing code
> (which will also evolve), it will become difficult to maintain/review a
> big diff in the future.
>
> I haven't thought much about it, but can start by creating the high-level
> interfaces first, and then
> going from there. For e.g.: create interfaces for operators which take in
> an array of rows instead of
> a single row - initially the array size can always be 1. Now, proceed from
> there.
>
> What makes you think, merging a branch 6 months/1 year from now will be
> easier than working on the
> current branch ?
>
> Having said that, both approaches can be made to work - but I think you
> are just delaying the
> merging work instead of taking the hit upfront.
>
> Thanks,
> -namit
>
>
>
> On 4/4/13 2:40 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
>
> >   We did consider implementing these changes on the trunk. But, it would
> >take several patches in various parts of the code before a simple end to
> >end query can be executed on vectorized path. For example a patch for
> >vectorized expressions  will be a significant amount of code, but will not
> >be used in a query until a vectorized operator is implemented and the
> >query
> >plan is modified to use the vectorized path. Vectorization of even basic
> >expressions becomes non trivial because we need to optimize for various
> >cases like chain of expressions, for non-null columns or repeating values
> >and also handle case for nullable columns, or short circuit optimization
> >etc. Careful handling of these is important for performance gains.
> >
> > Committing those intermediate patches in trunk  without stabilizing them
> >in a branch first might be a cause of concern.
> >
> >  A separate branch will let us make incremental changes to the system so
> >that each patch addresses a single feature or functionality and is small
> >enough to review.
> >   We will make sure that the branch is frequently updated with the
> >changes
> >in the trunk to avoid conflicts at the time of the merge.
> >  Also, we plan to propose merger of the branch as soon as a basic end to
> >end query begins to work and is sufficiently tested, instead of waiting
> >for
> >all operators to get vectorized. Initially our target is to make select
> >and
> >filter operators work with vectorized expressions for primitive types.
> >
> >   We will have a single global configuration flag that can be used to
> >turn
> >off the entire vectorization code path and we will specifically test to
> >make sure that when this flag is off there is no regression on the current
> >system. When vectorization is turned on, we will have a validation step to
> >make sure the given query is supported on the vectorization path otherwise
> >it will fall back to current code path.
> >
> >  Although, we intend to follow commit then review policy on the branch
> >for
> >speed of development, each patch will have an associated jira and will be
> >available for review and feedback.
> >
> >thanks
> >jitendra
> >
> >On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
> >
> >> It will be difficult to merge back the branch.
> >> Can you stage your changes incrementally ?
> >>
> >> I mean, start with the making the operators vectorized - it can be a for
> >> loop to
> >> start with ? I think it will be very difficult to merge it back if we
> >> diverge on this.
> >> I would recommend starting with simple interfaces for operators and then
<http://hortonworks.com/download/>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB