Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> non map-reduce for simple queries


Copy link to this message
-
Re: non map-reduce for simple queries


On 7/31/12 9:23 PM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:

>On Mon, Jul 30, 2012 at 11:38 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
>
>> That would be difficult. The % done can be estimated from the data
>>already
>> read.
>>
>
>I'm confused. Wouldn't the maximum size of the data remaining over the
>maximum size of the original query give a reasonable approximation of the
>amount of work done?
>

Yes and No, the filter behavior can vary a lot with the rows.
But, yes that is the best approximation we can have.

>
>>
>> It might be simpler to have a check like: if the query isn't done in
>> the first 5 seconds of running locally, you switch to mapreduce.
>>
>
>There are three problems I see:
>  * If the query is 95% done at 5 seconds,  it is a shame to kill it and
>start over again at 0% on mapreduce with a much longer latency. (Instead
>of
>spending the additional 0.25 seconds you spend an additional 60+.)
>  * You can't print anything until you know whether you are going to kill
>it or not. (The mapreduce results might come back in a different
>order....)
>With user-facing programs, it is much better to start printing early
>instead of later since it gives faster feedback to the user.
We cannot do this in either of the above approaches.

>  * It isn't predictable how the query will run. That makes it very hard
>to
>build applications on top of Hive.
>
>Do those make sense?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB