iwannaplay games 2012-09-05, 06:19
-Re: Improving query performance on hive and hdfs
Vasco Visser 2012-09-05, 09:21
You know that Hadoop is not designed for low latency. To say anything
useful I think you should share some more details:
- What query are you launching (does it have join/group by)
- How many mappers/reducers and jobs does the query spawn
- How does your data look like
- Also what version of Hadoop are you running, etc
Some things that are applicable depending on the things above
- Check if you can partition your data so that Hive can do partition pruning.
- If your query has joins then look at
https://cwiki.apache.org/Hive/languagemanual-joins.html (bottom of
page) to see how to organize your data to let Hive do a map side join.
- Try to play with the config option
mapreduce.job.reduce.slowstart.completedmaps, this can help you if you
have a lot of idle reducers in the map phase.
- I would try to limit the number of task per node to the number of
CPUs on the system, but I don't know if this is common practice.
On Wed, Sep 5, 2012 at 8:19 AM, iwannaplay games
<[EMAIL PROTECTED]> wrote:
> Hi all,
> I ran a query on hive on top of 90 million records that took 12 minutes to
> execute and same query on sql server took 8 minutes.My question is how can i
> make hadoop's performance better.What all configurations will improve the
> Thanks & Regards