Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive sample test


Copy link to this message
-
Re: Hive sample test
NIce, yea that would do it.

On Tue, Mar 5, 2013 at 1:26 PM, Mark Grover <[EMAIL PROTECTED]>wrote:

> I typically change my query to query from a limited version of the whole
> table.
>
> Change
>
> select really_expensive_select_clause
> from
> really_big_table
> where
> something=something
> group by something=something
>
> to
>
> select really_expensive_select_clause
> from
> (
> select
> *
> from
> really_big_table
> limit 100
> )t
> where
> something=something
> group by something=something
>
>
> On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler
> <[EMAIL PROTECTED]> wrote:
> > Unfortunately, it will still go through the whole thing, then just limit
> the
> > output. However, there's a flag that I think only works in more recent
> Hive
> > releases:
> >
> > set hive.limit.optimize.enable=true
> >
> > This is supposed to apply limiting earlier in the data stream, so it will
> > give different results that limiting just the output.
> >
> > Like Chuck said, you might consider sampling, but unless your table is
> > organized into buckets, you'll at least scan the whole table, but maybe
> not
> > do all computation over it ??
> >
> > Also, if you have a small sample data set:
> >
> > set hive.exec.mode.local.auto=true
> >
> > will cause Hive to bypass the Job and Task Trackers, calling APIs
> directly,
> > when it can do the whole thing in a single process. Not "lightning fast",
> > but faster.
> >
> > dean
> >
> > On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Just add a limit 1 to the end of your query.
> >>
> >>
> >>
> >>
> >> On Mar 5, 2013, at 1:45 PM, Kyle B <[EMAIL PROTECTED]> wrote:
> >>
> >> Hello,
> >>
> >> I was wondering if there is a way to quick-verify a Hive query before it
> >> is run against a big dataset? The tables I am querying against have
> millions
> >> of records, and I'd like to verify my Hive query before I run it
> against all
> >> records.
> >>
> >> Is there a way to test the query against a small subset of the data,
> >> without going into full MapReduce? As silly as this sounds, is there a
> way
> >> to MapReduce without the overhead of MapReduce? That way I can check my
> >> query is doing what I want before I run it against all records.
> >>
> >> Thanks,
> >>
> >> -Kyle
> >
> >
> >
> >
> > --
> > Dean Wampler, Ph.D.
> > thinkbiganalytics.com
> > +1-312-339-1330
> >
>

--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330