Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive sample test

Copy link to this message
Re: Hive sample test
Unfortunately, it will still go through the whole thing, then just limit
the output. However, there's a flag that I think only works in more recent
Hive releases:

set hive.limit.optimize.enable=true

This is supposed to apply limiting earlier in the data stream, so it will
give different results that limiting just the output.

Like Chuck said, you might consider sampling, but unless your table is
organized into buckets, you'll at least scan the whole table, but maybe not
do all computation over it ??

Also, if you have a small sample data set:

set hive.exec.mode.local.auto=true

will cause Hive to bypass the Job and Task Trackers, calling APIs directly,
when it can do the whole thing in a single process. Not "lightning fast",
but faster.


On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni <[EMAIL PROTECTED]> wrote:

> Just add a limit 1 to the end of your query.
> On Mar 5, 2013, at 1:45 PM, Kyle B <[EMAIL PROTECTED]> wrote:
> Hello,
> I was wondering if there is a way to quick-verify a Hive query before it
> is run against a big dataset? The tables I am querying against have
> millions of records, and I'd like to verify my Hive query before I run it
> against all records.
> Is there a way to test the query against a small subset of the data,
> without going into full MapReduce? As silly as this sounds, is there a way
> to MapReduce without the overhead of MapReduce? That way I can check my
> query is doing what I want before I run it against all records.
> Thanks,
> -Kyle
*Dean Wampler, Ph.D.*