Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Use distribute to spread across reducers


Copy link to this message
-
Re: Use distribute to spread across reducers
Hi Keith,

Have you tried the TABLESAMPLE command?
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling

Tim
On Thu, Oct 3, 2013 at 11:58 AM, Yin Huai <[EMAIL PROTECTED]> wrote:

> Hello Keith,
>
> Hive will not launch a MR job for your query because it basically reads
> all columns from a table. Hive will fetch the data for you directly from
> the underlying filesystem.
>
> Thanks,
>
> Yin
>
>
>
> On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley <[EMAIL PROTECTED]> wrote:
>
>> I'm trying to create a subset of a large table for testing.  The
>> following approach works:
>>
>> create table subset_table as
>> select * from large_table limit 1000
>>
>> ...but it only uses one reducer.  I would like to speed up the process of
>> creating a subset but distributing across multiple reducers.  I already
>> tried explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to
>> values larger than 1, but in this particular case, those values seem to be
>> over-ridden by Hive's internal query->to->mapreduce conversion; it ignores
>> those parameters.
>>
>> So, I tried this:
>>
>> create table subset_table as
>> select * from large_table limit 1000
>> distribute by column_name
>>
>> ...but that doesn't parse.  I get the following error:
>>
>> OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near
>> '1000'.
>>
>> I have tried NUMEROUS applications of parentheses, nested queries, etc.
>>  For example, here's just one (amongst perhaps ten variations on a theme):
>>
>> create table subset_table as
>> select * from (
>> from (
>> select * from large_table limit 1000
>> distribute by column_name
>> )) s
>>
>> Like I said, I've tried all sorts of combinations of the elements shown
>> above.  So far I have not even gotten any syntax to parse, much less run.
>>  Only the original query at the top will even pass the parsing stage of
>> processing.
>>
>> Any ideas?
>>
>> Thanks.
>>
>>
>> ________________________________________________________________________________
>> Keith Wiley     [EMAIL PROTECTED]     keithwiley.com
>> music.keithwiley.com
>>
>> "I do not feel obliged to believe that the same God who has endowed us
>> with
>> sense, reason, and intellect has intended us to forgo their use."
>>                                            --  Galileo Galilei
>>
>> ________________________________________________________________________________
>>
>>
>