Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Use distribute to spread across reducers


Copy link to this message
-
Re: Use distribute to spread across reducers
Hello Keith,

Hive will not launch a MR job for your query because it basically reads all
columns from a table. Hive will fetch the data for you directly from the
underlying filesystem.

Thanks,

Yin

On Wed, Oct 2, 2013 at 2:48 PM, Keith Wiley <[EMAIL PROTECTED]> wrote:

> I'm trying to create a subset of a large table for testing.  The following
> approach works:
>
> create table subset_table as
> select * from large_table limit 1000
>
> ...but it only uses one reducer.  I would like to speed up the process of
> creating a subset but distributing across multiple reducers.  I already
> tried explicitly setting mapred.reduce.tasks and hive.exec.reducers.max to
> values larger than 1, but in this particular case, those values seem to be
> over-ridden by Hive's internal query->to->mapreduce conversion; it ignores
> those parameters.
>
> So, I tried this:
>
> create table subset_table as
> select * from large_table limit 1000
> distribute by column_name
>
> ...but that doesn't parse.  I get the following error:
>
> OK FAILED: ParseException line 3:0 missing EOF at 'distribute' near '1000'.
>
> I have tried NUMEROUS applications of parentheses, nested queries, etc.
>  For example, here's just one (amongst perhaps ten variations on a theme):
>
> create table subset_table as
> select * from (
> from (
> select * from large_table limit 1000
> distribute by column_name
> )) s
>
> Like I said, I've tried all sorts of combinations of the elements shown
> above.  So far I have not even gotten any syntax to parse, much less run.
>  Only the original query at the top will even pass the parsing stage of
> processing.
>
> Any ideas?
>
> Thanks.
>
>
> ________________________________________________________________________________
> Keith Wiley     [EMAIL PROTECTED]     keithwiley.com
> music.keithwiley.com
>
> "I do not feel obliged to believe that the same God who has endowed us with
> sense, reason, and intellect has intended us to forgo their use."
>                                            --  Galileo Galilei
>
> ________________________________________________________________________________
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB