Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> non map-reduce for simple queries


Copy link to this message
-
Re: non map-reduce for simple queries

On 7/31/12 12:01 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:

>On Mon, Jul 30, 2012 at 9:12 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
>
>> The total number of bytes of the input will be used to determine whether
>> to not launch a map-reduce job for this
>> query. That was in my original mail.
>>
>> However, given any complex where condition and the lack of column
>> statistics in hive, we cannot determine the
>> number of bytes that would be needed to satisfy the where condition.
>
>
>All of these are heuristics are guidelines, clearly. My inclination would
>be to use the maximum data volume as the primary metric until we have a
>better understanding of cases where that doesn't work well. If we are
>going

Maximum data volume can be used to dictate the initial behavior. That has
been

already documented in the jira.
>to try the local solution and fall back to mapreduce, it seems better to
>put a limit well short of being done so that you don't waste as much work.
>Perhaps, if the query isn't 10% done in the first 5 seconds of running
>locally, you switch to mapreduce. Would that work?

That would be difficult. The % done can be estimated from the data already
read.

It might be simpler to have a check like: if the query isn't done in
the first 5 seconds of running locally, you switch to mapreduce.
>
>-- Owen
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB