Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> Branch for HIVE-4160


Copy link to this message
-
Re: Branch for HIVE-4160
Sounds good. I will create a branch soon.

Thanks,
Ashutosh
On Mon, Apr 8, 2013 at 7:31 PM, Namit Jain <[EMAIL PROTECTED]> wrote:

> Sounds good to me
>
>
> On 4/9/13 12:04 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
>
> >I agree that we shouldn't wait too long before merging the branch.
> >We are targeting to have basic queries working within a month from now and
> >will definitely propose to merge the branch back into trunk at that point.
> >We will limit the scope of the work on the branch to just a few operators
> >and primitive datatypes. Does that sound reasonable?
> >
> >regards
> >jitendra
> >
> >On Wed, Apr 3, 2013 at 9:03 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
> >
> >> There is no right answer, but I feel if you go this path a long way, it
> >> will be very difficult
> >> to merge back. Given that this is not a new functionality, and
> >>improvement
> >> to existing code
> >> (which will also evolve), it will become difficult to maintain/review a
> >> big diff in the future.
> >>
> >> I haven't thought much about it, but can start by creating the
> >>high-level
> >> interfaces first, and then
> >> going from there. For e.g.: create interfaces for operators which take
> >>in
> >> an array of rows instead of
> >> a single row - initially the array size can always be 1. Now, proceed
> >>from
> >> there.
> >>
> >> What makes you think, merging a branch 6 months/1 year from now will be
> >> easier than working on the
> >> current branch ?
> >>
> >> Having said that, both approaches can be made to work - but I think you
> >> are just delaying the
> >> merging work instead of taking the hit upfront.
> >>
> >> Thanks,
> >> -namit
> >>
> >>
> >>
> >> On 4/4/13 2:40 AM, "Jitendra Pandey" <[EMAIL PROTECTED]> wrote:
> >>
> >> >   We did consider implementing these changes on the trunk. But, it
> >>would
> >> >take several patches in various parts of the code before a simple end
> >>to
> >> >end query can be executed on vectorized path. For example a patch for
> >> >vectorized expressions  will be a significant amount of code, but will
> >>not
> >> >be used in a query until a vectorized operator is implemented and the
> >> >query
> >> >plan is modified to use the vectorized path. Vectorization of even
> >>basic
> >> >expressions becomes non trivial because we need to optimize for various
> >> >cases like chain of expressions, for non-null columns or repeating
> >>values
> >> >and also handle case for nullable columns, or short circuit
> >>optimization
> >> >etc. Careful handling of these is important for performance gains.
> >> >
> >> > Committing those intermediate patches in trunk  without stabilizing
> >>them
> >> >in a branch first might be a cause of concern.
> >> >
> >> >  A separate branch will let us make incremental changes to the system
> >>so
> >> >that each patch addresses a single feature or functionality and is
> >>small
> >> >enough to review.
> >> >   We will make sure that the branch is frequently updated with the
> >> >changes
> >> >in the trunk to avoid conflicts at the time of the merge.
> >> >  Also, we plan to propose merger of the branch as soon as a basic end
> >>to
> >> >end query begins to work and is sufficiently tested, instead of waiting
> >> >for
> >> >all operators to get vectorized. Initially our target is to make select
> >> >and
> >> >filter operators work with vectorized expressions for primitive types.
> >> >
> >> >   We will have a single global configuration flag that can be used to
> >> >turn
> >> >off the entire vectorization code path and we will specifically test to
> >> >make sure that when this flag is off there is no regression on the
> >>current
> >> >system. When vectorization is turned on, we will have a validation
> >>step to
> >> >make sure the given query is supported on the vectorization path
> >>otherwise
> >> >it will fall back to current code path.
> >> >
> >> >  Although, we intend to follow commit then review policy on the branch
> >> >for
> >> >speed of development, each patch will have an associated jira and will
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB