Keith Wiley 2013-03-22, 23:02
instead of >= can you just try = if you want to limit top 100 (b being a
partition i guess it will have more that 100 records to fit into your
to improve your query performance your table file format matters as well.
Which one are you using?
how many partitions are there?
what's the size of the cluster?
you can set the number of reducers but if your query just has one key then
only one reducer will get the data and rest will run empty
On Sat, Mar 23, 2013 at 4:32 AM, Keith Wiley <[EMAIL PROTECTED]> wrote:
> The following query translates into a many-map-single-reduce job (which is
> common) and also slags through the reduce stage...it's killing the overall
> select * from a where b >= 'c' order by b desc limit 100
> Note that b is a partition. What component is making the reducer heavy?
> Is it the order by or the limit (I'm sure it's not the partition-specific
> where clause, right?)? Are there ways to improve its performance?
> Keith Wiley [EMAIL PROTECTED] keithwiley.com
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an
> itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch
> scratch. All together this implies: He scratched the itch from the scratch
> itched but would never itch the scratch from the itch that scratched."
> -- Keith Wiley
Keith Wiley 2013-03-24, 13:29