Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Setting up stats database


Copy link to this message
-
Setting up stats database
hi,

I'm try to use postgres as stats database. And made following settings
in hive-site.xml
<property>
  <name>hive.stats.dbclass</name>
  <value>jdbc:postgresql</value>
  <description>The default database that stores temporary hive
statistics.</description>
</property>

<property>
  <name>hive.stats.autogather</name>
  <value>true</value>
  <description>A flag to gather statistics automatically during the
INSERT OVERWRITE command.</description>
</property>

<property>
  <name>hive.stats.jdbcdriver</name>
  <value>org.postgresql.Driver</value>
  <description>The JDBC driver for the database that stores temporary
hive statistics.</description>
</property>

<property>
  <name>hive.stats.dbconnectionstring</name>
  <value>jdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=true;user=hive;password=pwd</value>
  <description>The default connection string for the database that
stores temporary hive statistics.</description>
</property>

I use postgres as hive meta database, so there is a
postgresql-9.0-801.jdbc4.jar file in lib.

After run 'analyse table t1 partitions(dt) comput statistics;' in hive
cli, it will output some stats info in cli, but nothing in db. And I
can found there is the flowing errors

1-08-15 14:54:54,767 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: Stats Gathering
found a new partition spec = dt=20110805
2011-08-15 14:54:54,767 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
2011-08-15 14:54:54,767 INFO ExecMapper: ExecMapper: processing 1
rows: used memory = 39953640
2011-08-15 14:54:54,768 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 1 finished. closing...
2011-08-15 14:54:54,768 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 1 forwarded 2 rows
2011-08-15 14:54:54,768 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2011-08-15 14:54:54,768 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished.
closing...
2011-08-15 14:54:54,768 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 2 rows
2011-08-15 14:54:54,772 ERROR
org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher: Error during
JDBC connection to
jdbc:postgresql://localhost/hive_statsdb?createDatabaseIfNotExist=true;user=hive;password=pwd.
java.lang.ClassNotFoundException: org.postgresql.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:55)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.publishStats(TableScanOperator.java:202)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:164)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-08-15 14:54:54,774 INFO
org.apache.hadoop.hive.ql.exec.TableScanOperator: StatsPublishing
error: cannot connect to database.
2011-08-15 14:54:54,774 INFO
org.apache.hadoop.hive.ql.exec.MapOperator: 1 Close done