Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive sample test


+
Kyle B 2013-03-05, 18:45
+
Connell, Chuck 2013-03-05, 18:51
+
Joey DAntoni 2013-03-05, 18:48
+
Dean Wampler 2013-03-05, 18:57
Copy link to this message
-
Re: Hive sample test
I typically change my query to query from a limited version of the whole table.

Change

select really_expensive_select_clause
from
really_big_table
where
something=something
group by something=something

to

select really_expensive_select_clause
from
(
select
*
from
really_big_table
limit 100
)t
where
something=something
group by something=something
On Tue, Mar 5, 2013 at 10:57 AM, Dean Wampler
<[EMAIL PROTECTED]> wrote:
> Unfortunately, it will still go through the whole thing, then just limit the
> output. However, there's a flag that I think only works in more recent Hive
> releases:
>
> set hive.limit.optimize.enable=true
>
> This is supposed to apply limiting earlier in the data stream, so it will
> give different results that limiting just the output.
>
> Like Chuck said, you might consider sampling, but unless your table is
> organized into buckets, you'll at least scan the whole table, but maybe not
> do all computation over it ??
>
> Also, if you have a small sample data set:
>
> set hive.exec.mode.local.auto=true
>
> will cause Hive to bypass the Job and Task Trackers, calling APIs directly,
> when it can do the whole thing in a single process. Not "lightning fast",
> but faster.
>
> dean
>
> On Tue, Mar 5, 2013 at 12:48 PM, Joey D'Antoni <[EMAIL PROTECTED]> wrote:
>>
>> Just add a limit 1 to the end of your query.
>>
>>
>>
>>
>> On Mar 5, 2013, at 1:45 PM, Kyle B <[EMAIL PROTECTED]> wrote:
>>
>> Hello,
>>
>> I was wondering if there is a way to quick-verify a Hive query before it
>> is run against a big dataset? The tables I am querying against have millions
>> of records, and I'd like to verify my Hive query before I run it against all
>> records.
>>
>> Is there a way to test the query against a small subset of the data,
>> without going into full MapReduce? As silly as this sounds, is there a way
>> to MapReduce without the overhead of MapReduce? That way I can check my
>> query is doing what I want before I run it against all records.
>>
>> Thanks,
>>
>> -Kyle
>
>
>
>
> --
> Dean Wampler, Ph.D.
> thinkbiganalytics.com
> +1-312-339-1330
>
+
Dean Wampler 2013-03-05, 19:44
+
Ramki Palle 2013-03-08, 11:30