Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # dev - non map-reduce for simple queries


+
Namit Jain 2012-07-28, 20:35
+
Edward Capriolo 2012-07-28, 21:41
+
Navis류승우 2012-07-29, 01:17
+
Namit Jain 2012-07-29, 14:28
+
Namit Jain 2012-07-29, 14:45
+
Owen OMalley 2012-07-30, 21:01
+
Navis류승우 2012-07-31, 01:37
+
Namit Jain 2012-07-31, 04:12
+
Owen OMalley 2012-07-31, 06:31
+
Namit Jain 2012-07-31, 06:38
Copy link to this message
-
Re: non map-reduce for simple queries
Owen O'Malley 2012-07-31, 15:53
On Mon, Jul 30, 2012 at 11:38 PM, Namit Jain <[EMAIL PROTECTED]> wrote:

> That would be difficult. The % done can be estimated from the data already
> read.
>

I'm confused. Wouldn't the maximum size of the data remaining over the
maximum size of the original query give a reasonable approximation of the
amount of work done?
>
> It might be simpler to have a check like: if the query isn't done in
> the first 5 seconds of running locally, you switch to mapreduce.
>

There are three problems I see:
  * If the query is 95% done at 5 seconds,  it is a shame to kill it and
start over again at 0% on mapreduce with a much longer latency. (Instead of
spending the additional 0.25 seconds you spend an additional 60+.)
  * You can't print anything until you know whether you are going to kill
it or not. (The mapreduce results might come back in a different order....)
With user-facing programs, it is much better to start printing early
instead of later since it gives faster feedback to the user.
  * It isn't predictable how the query will run. That makes it very hard to
build applications on top of Hive.

Do those make sense?
+
Namit Jain 2012-07-31, 17:47