Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # dev >> non map-reduce for simple queries


Copy link to this message
-
Re: non map-reduce for simple queries
The total number of bytes of the input will be used to determine whether
to not launch a map-reduce job for this
query. That was in my original mail.

However, given any complex where condition and the lack of column
statistics in hive, we cannot determine the
number of bytes that would be needed to satisfy the where condition.

On 7/31/12 7:07 AM, "Navis류승우" <[EMAIL PROTECTED]> wrote:

>It supports table sampling also.
>
>select * from src TABLESAMPLE (BUCKET 1 OUT OF 40 ON key);
>select * from src TABLESAMPLE (0.25 PERCENT);
>
>But there is no sampling option specifying number of bytes. This can be
>done in another issue.
>
>2012/7/31 Owen O'Malley <[EMAIL PROTECTED]>
>
>> On Sat, Jul 28, 2012 at 6:17 PM, Navis류승우 <[EMAIL PROTECTED]> wrote:
>>
>> > I was thinking of timeout for fetching, 2000msec for example. How
>>about
>> > that?
>> >
>>
>> Instead of time, which requires launching the query and letting it
>>timeout,
>> how about determining the number of bytes that would need to be fetched
>>to
>> the local box? Limiting it to 100 or 200 mb seems reasonable.
>>
>> -- Owen
>>