Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Query crawls through reducer

Copy link to this message
Re: Query crawls through reducer
instead of >= can you just try =  if you want to limit top 100 (b being a
partition  i guess it will have more that 100 records to fit into your

to improve your query performance your table file format matters as well.
Which one are you using?
how many partitions are there?
what's the size of the cluster?
you can set the number of reducers but if your query just has one key then
only one reducer will get the data and rest will run empty

On Sat, Mar 23, 2013 at 4:32 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:

> The following query translates into a many-map-single-reduce job (which is
> common) and also slags through the reduce stage...it's killing the overall
> query:
> select * from a where b >= 'c' order by b desc limit 100
> Note that b is a partition.  What component is making the reducer heavy?
>  Is it the order by or the limit (I'm sure it's not the partition-specific
> where clause, right?)?  Are there ways to improve its performance?
> ________________________________________________________________________________
> Keith Wiley     [EMAIL PROTECTED]     keithwiley.com
> music.keithwiley.com
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an
> itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch
> can't
> scratch. All together this implies: He scratched the itch from the scratch
> that
> itched but would never itch the scratch from the itch that scratched."
>                                            --  Keith Wiley
> ________________________________________________________________________________
Nitin Pawar