|
|
-
problem in Hive performancesagar nikam 2012-10-30, 13:51
Respected sir,
I am dealing with a database (2.5 GB) having some tables only 40 row to some having 9 million rows data. when I am doing any query for large table it takes more time. I want results in less time small query--> ========================================================================hive> select count(*) from cidade; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201210300724_0003, Tracking URL http://localhost:50030/jobdetails.jsp?jobid=job_201210300724_0003 Kill Command = /home/trendwise/Hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201210300724_0003 2012-10-30 07:37:41,588 Stage-1 map = 0%, reduce = 0% 2012-10-30 07:37:57,493 Stage-1 map = 100%, reduce = 0% 2012-10-30 07:38:17,905 Stage-1 map = 100%, reduce = 33% 2012-10-30 07:38:20,965 Stage-1 map = 100%, reduce = 100% Ended Job = job_201210300724_0003 OK 5566 Time taken: 50.172 seconds ================================================================================================================hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>3</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.block.size</name> <value>131072</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration> does these setting affects performance of hive? dfs.replication=3 dfs.block.size=131072 can i set it from hive prompt as hive>set dfs.replication=5 Is this value remains for a perticular session only ? or Is it better to change it in .xml file ? which more setting should i do to incrase performance ? Sagar Nikam Trendwise Analytics Bangalore,INDIA +
Bharath Ganesh 2012-11-08, 08:06
|