Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # dev >> non map-reduce for simple queries


+
Namit Jain 2012-07-28, 20:35
+
Edward Capriolo 2012-07-28, 21:41
+
Navis류승우 2012-07-29, 01:17
+
Namit Jain 2012-07-29, 14:28
+
Namit Jain 2012-07-29, 14:45
+
Owen OMalley 2012-07-30, 21:01
+
Navis류승우 2012-07-31, 01:37
+
Namit Jain 2012-07-31, 04:12
+
Owen OMalley 2012-07-31, 06:31
+
Namit Jain 2012-07-31, 06:38
Copy link to this message
-
Re: non map-reduce for simple queries
On Mon, Jul 30, 2012 at 11:38 PM, Namit Jain <[EMAIL PROTECTED]> wrote:

> That would be difficult. The % done can be estimated from the data already
> read.
>

I'm confused. Wouldn't the maximum size of the data remaining over the
maximum size of the original query give a reasonable approximation of the
amount of work done?
>
> It might be simpler to have a check like: if the query isn't done in
> the first 5 seconds of running locally, you switch to mapreduce.
>

There are three problems I see:
  * If the query is 95% done at 5 seconds,  it is a shame to kill it and
start over again at 0% on mapreduce with a much longer latency. (Instead of
spending the additional 0.25 seconds you spend an additional 60+.)
  * You can't print anything until you know whether you are going to kill
it or not. (The mapreduce results might come back in a different order....)
With user-facing programs, it is much better to start printing early
instead of later since it gives faster feedback to the user.
  * It isn't predictable how the query will run. That makes it very hard to
build applications on top of Hive.

Do those make sense?
+
Namit Jain 2012-07-31, 17:47
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB