Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - hcatalog takes minutes talking to mysql metadata


Copy link to this message
-
Re: hcatalog takes minutes talking to mysql metadata
Eugene Koifman 2013-08-28, 18:43
perhaps HIVE-4914 relevant
On Wed, Aug 28, 2013 at 3:11 AM, Michał Czerwiński <[EMAIL PROTECTED]
> wrote:

> Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT
> jars (from
> https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/)
> with exactly the same issue. That could possibly indicate that problem may
> be related to the actual hive-metastore component and the way it interacts
> with metastore, thoughts?
>
>
> On 27 August 2013 18:41, Michał Czerwiński <[EMAIL PROTECTED]>wrote:
>
>> In PIG I am doing query like this:
>>
>> sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader;
>> sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2';
>> ll = LIMIT sdp 100;
>> dump ll;
>>
>> and hcatalog starts talking for few minutes to mysql asking for metadata,
>> in the meantime after few seconds pig
>> does: org.apache.thrift.transport.TTransportException:
>> java.net.SocketTimeoutException: Read timed out
>>
>> Number of partitions I have:
>> hive -e 'use db1; show partitions table1' |wc -l
>> Time taken: 1.467 seconds
>> 37748
>>
>> When I run the same query on a different environment where I have only
>> ~1000 partitions all works fine.
>>
>> Also problem does not exist on cdh3 and hcatalog-0.4.0.
>>
>> In hcatalog's logs I can see:
>> (note the timestamp, I run the query at 17:10:45,216)
>>
>> 2013-08-27 17:10:46,275 INFO  DataNucleus.MetaData
>> (Log4JLogger.java:info(77)) - Listener found initialisation for persistable
>> class org.apache.hadoop.hive.metastore.model.MPartition
>>
>> 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore
>> (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all
>> objects for listMPartitionsByFilter
>>
>> 2013-08-27 17:22:32,410 INFO  metastore.ObjectStore
>> (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning >> 37748
>>
>> After that the hcatalog continues to:
>> 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction
>> (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms
>>
>> Please note that I have datanucleus set to DEBUG and that slows things
>> down significantly, without that, it still takes around 7 minutes for
>> hcatalog to settle.
>>
>> Also datanucleus settings from the hcatalog's logs:
>>
>>  datanucleus.autoStartMechanismMode = checked
>>  javax.jdo.option.Multithreaded = true
>>  datanucleus.identifierFactory = datanucleus
>>  datanucleus.transactionIsolation = read
>>  datanucleus.validateTables = false
>>  javax.jdo.option.ConnectionURL = jdbc:mysql://XXX
>>  javax.jdo.option.DetachAllOnCommit = true
>>  javax.jdo.option.NonTransactionalRead = true
>>  datanucleus.validateConstraints = false
>>  javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver
>>  javax.jdo.option.ConnectionUserName = hive
>>  datanucleus.validateColumns = false
>>  datanucleus.cache.level2 = false
>>  datanucleus.plugin.pluginRegistryBundleCheck = LOG
>>  datanucleus.cache.level2.type = none
>>  javax.jdo.PersistenceManagerFactoryClass >> org.datanucleus.jdo.JDOPersistenceManagerFactory
>>  datanucleus.autoCreateSchema = true
>>  datanucleus.storeManagerType = rdbms
>>  datanucleus.connectionPoolingType = DBCP
>>
>> This runs on CDH4 4.3.0
>> hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0
>>
>> Ideas?
>>
>
>

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.