Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: Want to improve the performance for execution of Hive Jobs.


Copy link to this message
-
Re: Want to improve the performance for execution of Hive Jobs.
1) check the jobtracker url to see how many maps/reducers have been launched
2) if you have a large dataset and wants to execute it fast, you
set mapred.min.split.size and mapred.max.split.size to an optimal value so
that more mappers will be launched and will finish
3) if you are doing joins, there are different ways to go according to the
data you have and size of data

it will be helpful if you can let us know your datasizes and query details

On Tue, May 8, 2012 at 10:07 AM, Bhavesh Shah <[EMAIL PROTECTED]>wrote:

> Hello all,
> I have written a Hive JDBC code and created a JAR of it. I am running that
> JAR on 10 cluster.
> But the problem as I am using the 10 cluster still the performance is same
> as that on single cluster.
>
> What to do to improve the performance of Hive Jobs? Is there anything
> configuration setting to set before the submitting Hive Jobs to cluster?
> One more thing I want to know is that How can we come to know that is job
> running on all cluster?
>
> Please let me know if anyone knows about it?
>
> --
> Regards,
> Bhavesh Shah
>
>
--
Nitin Pawar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB