Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> non map-reduce for simple queries

Copy link to this message
Re: non map-reduce for simple queries
If where condition is too complex , selecting specific columns seems simple
enough and useful.

On Saturday, July 28, 2012, Namit Jain <[EMAIL PROTECTED]> wrote:
> Currently, hive does not launch map-reduce jobs for the following queries:
> select * from <T> where <condition on partition columns> (limit <n>)?
> This behavior is not configurable, and cannot be altered.
> HIVE-2925 wants to extend this behavior. The goal is not to spawn
map-reduce jobs for the following queries:
> Select <expr> from <T> where <any condition> (limit <n>)?
> It is currently controlled by one parameter:
hive.aggressive.fetch.task.conversion, based on which it is decided,
whether to spawn
> map-reduce jobs or not for the queries of the above type. Note that this
can be beneficial for certain types of queries, since it is
> avoiding the expensive step of spawning map-reduce. However, it can be
pretty expensive for certain types of queries: selecting
> a very large number of rows, the query having a very selective filter
(which is satisfied by a very number of rows, and therefore involves
> scanning a very large table) etc. The user does not have any control on
this. Note that it cannot be done by hooks, since the pre-semantic
> hooks does not have enough information: type of the query, inputs etc.
and it is too late to do anything in the post-semantic hook (the
> query plan has already been altered).
> I would like to propose the following configuration parameters to control
this behavior.
> hive.fetch.task.conversion: true, false, auto
> If the value is true, then all queries with only selects and filters will
be converted
> If the value is false, then no query will be converted
> If the value is auto (which should be the default behavior), there should
be additional parameters to control the semantics.
> hive.fetch.task.auto.limit.threshold               ---> integer value X1
> hive.fetch.task.auto.inputsize.threshold      ---> integer value X2
> If either the query has a limit lower than X1, or the input size is
smaller than X2, the queries containing only filters and selects will be
converted to not use
> map-reudce jobs.
> Comments…
> -namit