Namit Jain 2012-07-28, 20:35
Edward Capriolo 2012-07-28, 21:41
Navis류승우 2012-07-29, 01:17
Namit Jain 2012-07-29, 14:28
Namit Jain 2012-07-29, 14:45
Owen OMalley 2012-07-30, 21:01
Navis류승우 2012-07-31, 01:37
Namit Jain 2012-07-31, 04:12
Owen OMalley 2012-07-31, 06:31
Namit Jain 2012-07-31, 06:38
On Mon, Jul 30, 2012 at 11:38 PM, Namit Jain <[EMAIL PROTECTED]> wrote:
> That would be difficult. The % done can be estimated from the data already
I'm confused. Wouldn't the maximum size of the data remaining over the
maximum size of the original query give a reasonable approximation of the
amount of work done?
> It might be simpler to have a check like: if the query isn't done in
> the first 5 seconds of running locally, you switch to mapreduce.
There are three problems I see:
* If the query is 95% done at 5 seconds, it is a shame to kill it and
start over again at 0% on mapreduce with a much longer latency. (Instead of
spending the additional 0.25 seconds you spend an additional 60+.)
* You can't print anything until you know whether you are going to kill
it or not. (The mapreduce results might come back in a different order....)
With user-facing programs, it is much better to start printing early
instead of later since it gives faster feedback to the user.
* It isn't predictable how the query will run. That makes it very hard to
build applications on top of Hive.
Do those make sense?
Namit Jain 2012-07-31, 17:47