Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Query crawls through reducer


+
Keith Wiley 2013-03-22, 23:02
Copy link to this message
-
Re: Query crawls through reducer
instead of >= can you just try =  if you want to limit top 100 (b being a
partition  i guess it will have more that 100 records to fit into your
limit)

to improve your query performance your table file format matters as well.
Which one are you using?
how many partitions are there?
what's the size of the cluster?
you can set the number of reducers but if your query just has one key then
only one reducer will get the data and rest will run empty

On Sat, Mar 23, 2013 at 4:32 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:

> The following query translates into a many-map-single-reduce job (which is
> common) and also slags through the reduce stage...it's killing the overall
> query:
>
> select * from a where b >= 'c' order by b desc limit 100
>
> Note that b is a partition.  What component is making the reducer heavy?
>  Is it the order by or the limit (I'm sure it's not the partition-specific
> where clause, right?)?  Are there ways to improve its performance?
>
>
> ________________________________________________________________________________
> Keith Wiley     [EMAIL PROTECTED]     keithwiley.com
> music.keithwiley.com
>
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an
> itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch
> can't
> scratch. All together this implies: He scratched the itch from the scratch
> that
> itched but would never itch the scratch from the itch that scratched."
>                                            --  Keith Wiley
>
> ________________________________________________________________________________
>
>
--
Nitin Pawar
+
Keith Wiley 2013-03-24, 13:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB