Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Getting Slow Query Performance!


Copy link to this message
-
Re: Getting Slow Query Performance!
bejoy_ks@... 2013-03-12, 11:54
Hi

Since you are on a pseudo distributed/ single node environment the hadoop mapreduce parallelism is limited.

You might be having just a few map slots and map tasks might be in queue waiting for others to complete. In a larger cluster your job should be faster.

As a side note, Certain SQL queries that ulilize indexing would be faster in sql server than in hive.

Regards
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Gobinda Paul <[EMAIL PROTECTED]>
Date: Tue, 12 Mar 2013 15:09:31
To: [EMAIL PROTECTED]<[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Getting Slow Query Performance!
i use sqoop to import 30GB data ( two table employee(aprox 21 GB)  and salary(aprox 9GB ) into hadoop(Single Node) via hive.
i run a sample query like SELECT EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM EMPLOYEE JOIN SALARY WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUNT>900000;
In Hive it's take 15 Min(aprox.) where as mySQL take 4.5 min( aprox ) to execute that query .
CPU: Pentium(R) Dual-Core  CPU      E5700  @ 3.00GHzRAM:  2GBHDD: 500GB

Here IS My hive-site.xml conf.

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>  <property>    <name>javax.jdo.option.ConnectionURL</name>    <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>  </property>  <property>    <name>javax.jdo.option.ConnectionDriverName</name>    <value>com.mysql.jdbc.Driver</value>  </property>  <property>    <name>javax.jdo.option.ConnectionUserName</name>    <value>root</value>  </property>  <property>    <name>javax.jdo.option.ConnectionPassword</name>    <value>123456</value>  </property>  <property>    <name>hive.hwi.listen.host</name>     <value>0.0.0.0</value>     <description>This is the host address the Hive Web Interface will listen on</description>  </property>  <property>    <name>hive.hwi.listen.port</name>    <value>9999</value>    <description>This is the port the Hive Web Interface will listen on</description>   </property>   <property>    <name>hive.hwi.war.file</name>    <value>/lib/hive-hwi-0.9.0.war</value>    <description>This is the WAR file with the jsp content for Hive Web Interface</description>   </property>
  <property>  <name>mapred.reduce.tasks</name>    <value>-1</value> <description>The default number of reduce tasks per job.  Typically set to a prime close to the number of available hosts.  Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers. </description>   </property>
   <property>     <name>hive.exec.reducers.bytes.per.reducer</name>     <value>1000000000</value>     <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description>   </property>

  <property>    <name>hive.exec.reducers.max</name>    <value>999</value>        <description>max number of reducers will be used. If the one       specified in the configuration parameter mapred.reduce.tasks is       negative, hive will use this one as the max number of reducers when       automatically determine number of reducers.       </description>   </property>
  <property>    <name>hive.exec.scratchdir</name>    <value>/tmp/hive-${user.name}</value>    <description>Scratch space for Hive jobs</description>  </property>
   <property>     <name>hive.metastore.local</name>     <value>true</value>   </property>
</configuration>

Any IDEA ??