Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive performance-how to increase ?


Copy link to this message
-
Hive performance-how to increase ?
Respected sir,

     I am dealing with a database (2.5 GB) having some tables only 40 row
to some having 9 million rows data.
when I am doing any query for large table it takes more time.
I want results in less time

small query-->
========================================================================hive> select count(*) from cidade;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201210300724_0003, Tracking URL http://localhost:50030/jobdetails.jsp?jobid=job_201210300724_0003
Kill Command = /home/trendwise/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201210300724_0003
2012-10-30 07:37:41,588 Stage-1 map = 0%,  reduce = 0%
2012-10-30 07:37:57,493 Stage-1 map = 100%,  reduce = 0%
2012-10-30 07:38:17,905 Stage-1 map = 100%,  reduce = 33%
2012-10-30 07:38:20,965 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201210300724_0003
OK
5566
Time taken: 50.172 seconds
================================================================================================================hdfs-site.xml

<configuration>
<property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  </description>
</property>

<property>
  <name>dfs.block.size</name>
  <value>131072</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  </description>
</property>
</configuration>
does these setting affects performance of hive?
dfs.replication=3
dfs.block.size=131072

can i set it from hive prompt as
hive>set dfs.replication=5
Is this value remains for a perticular session only ?
or Is it better to change it in .xml file ?

which more setting should i do to incrase performance ?

Sagar Nikam
Trendwise Analytics
Bangalore,INDIA
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB