Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors


+
agateaaa 2013-07-29, 17:43
+
Nitin Pawar 2013-07-29, 18:02
+
agateaaa 2013-07-29, 18:29
+
agateaaa 2013-07-30, 00:22
+
Nitin Pawar 2013-07-30, 07:49
+
agateaaa 2013-07-31, 16:48
Copy link to this message
-
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Hi All:

Put some debugging code in TUGIContainingTransport.getTransport() and I
tracked it down to

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

//return transMap.get(trans); <-change
          TUGIContainingTransport retTrans = transMap.get(trans);

          if ( retTrans == null ) {

}

On Wed, Jul 31, 2013 at 9:48 AM, agateaaa <[EMAIL PROTECTED]> wrote:

> Thanks Nitin
>
> There arent too many connections in close_wait state only 1 or two when we
> run into this. Most likely its because of dropped connection.
>
> I could not find any read or write timeouts we can set for the thrift
> server which will tell thrift to hold on to the client connection.
>  See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
> to have been implemented yet. We do have set a client connection timeout
> but cannot find
> an equivalent setting for the server.
>
> We have  a suspicion that this happens when we run two client processes
> which modify two distinct partitions of the same hive table. We put in a
> workaround so that the two hive client processes never run together and so
> far things look ok but we will keep monitoring.
>
> Could it be because hive metastore server is not thread safe, would
> running two alter table statements on two distinct partitions of the same
> table using two client connections cause problems like these, where hive
> metastore server closes or drops a wrong client connection and leaves the
> other hanging?
>
> Agateaaa
>
>
>
>
> On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> The mentioned flow is called when you have unsecure mode of thrift
>> metastore client-server connection. So one way to avoid this is have a
>> secure way.
>>
>> <code>
>> public boolean process(final TProtocol in, final TProtocol out)
>> throwsTException {
>> setIpAddress(in);
>> ...
>> ...
>> ...
>> @Override
>>      protected void setIpAddress(final TProtocol in) {
>>     TUGIContainingTransport ugiTrans >> (TUGIContainingTransport)in.getTransport();
>>                     Socket socket = ugiTrans.getSocket();
>>     if (socket != null) {
>>       setIpAddress(socket);
>>
>> </code>
>>
>>
>> From the above code snippet, it looks like the null pointer exception is
>> not handled if the getSocket returns null.
>>
>> can you check whats the ulimit setting on the server? If its set to
>> default
>> can you set it to unlimited and restart hcat server. (This is just a wild
>> guess).
>>
>> also the getSocket method suggests "If the underlying TTransport is an
>> instance of TSocket, it returns the Socket object which it contains.
>> Otherwise it returns null."
>>
>> so someone from thirft gurus need to tell us whats happening. I have no
>> knowledge of this depth
>>
>> may be Ashutosh or Thejas will be able to help on this.
>>
>>
>>
>>
>> From the netstat close_wait, it looks like the hive metastore server has
>> not closed the connection (do not know why yet), may be the hive dev guys
>> can help.Are there too many connections in close_wait state?
>>
>>
>>
>> On Tue, Jul 30, 2013 at 5:52 AM, agateaaa <[EMAIL PROTECTED]> wrote:
>>
>> > Looking at the hive metastore server logs see errors like these:
>> >
>> > 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
>> > (TThreadPoolServer.java:run(182)) - Error occurred during processing of
>> > message.
>> > java.lang.NullPointerException
>> >         at
>> >
>> >
>> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
>> >         at
>> >
>> >
>> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
>> >         at
>> >
>> >
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
+
agateaaa 2013-08-29, 18:22
+
Ashutosh Chauhan 2013-08-29, 18:53
+
agateaaa 2013-08-29, 21:39
+
agateaaa 2013-08-29, 21:39
+
agateaaa 2013-07-30, 00:21
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB