Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors


+
agateaaa 2013-07-29, 17:43
+
Nitin Pawar 2013-07-29, 18:02
+
agateaaa 2013-07-29, 18:29
+
agateaaa 2013-07-30, 00:22
+
Nitin Pawar 2013-07-30, 07:49
+
agateaaa 2013-07-31, 16:48
Copy link to this message
-
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
agateaaa 2013-08-29, 17:57
Hi All:

Put some debugging code in TUGIContainingTransport.getTransport() and I
tracked it down to

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

//return transMap.get(trans); <-change
          TUGIContainingTransport retTrans = transMap.get(trans);

          if ( retTrans == null ) {

}

On Wed, Jul 31, 2013 at 9:48 AM, agateaaa <[EMAIL PROTECTED]> wrote:

> Thanks Nitin
>
> There arent too many connections in close_wait state only 1 or two when we
> run into this. Most likely its because of dropped connection.
>
> I could not find any read or write timeouts we can set for the thrift
> server which will tell thrift to hold on to the client connection.
>  See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
> to have been implemented yet. We do have set a client connection timeout
> but cannot find
> an equivalent setting for the server.
>
> We have  a suspicion that this happens when we run two client processes
> which modify two distinct partitions of the same hive table. We put in a
> workaround so that the two hive client processes never run together and so
> far things look ok but we will keep monitoring.
>
> Could it be because hive metastore server is not thread safe, would
> running two alter table statements on two distinct partitions of the same
> table using two client connections cause problems like these, where hive
> metastore server closes or drops a wrong client connection and leaves the
> other hanging?
>
> Agateaaa
>
>
>
>
> On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:
>
>> The mentioned flow is called when you have unsecure mode of thrift
>> metastore client-server connection. So one way to avoid this is have a
>> secure way.
>>
>> <code>
>> public boolean process(final TProtocol in, final TProtocol out)
>> throwsTException {
>> setIpAddress(in);
>> ...
>> ...
>> ...
>> @Override
>>      protected void setIpAddress(final TProtocol in) {
>>     TUGIContainingTransport ugiTrans >> (TUGIContainingTransport)in.getTransport();
>>                     Socket socket = ugiTrans.getSocket();
>>     if (socket != null) {
>>       setIpAddress(socket);
>>
>> </code>
>>
>>
>> From the above code snippet, it looks like the null pointer exception is
>> not handled if the getSocket returns null.
>>
>> can you check whats the ulimit setting on the server? If its set to
>> default
>> can you set it to unlimited and restart hcat server. (This is just a wild
>> guess).
>>
>> also the getSocket method suggests "If the underlying TTransport is an
>> instance of TSocket, it returns the Socket object which it contains.
>> Otherwise it returns null."
>>
>> so someone from thirft gurus need to tell us whats happening. I have no
>> knowledge of this depth
>>
>> may be Ashutosh or Thejas will be able to help on this.
>>
>>
>>
>>
>> From the netstat close_wait, it looks like the hive metastore server has
>> not closed the connection (do not know why yet), may be the hive dev guys
>> can help.Are there too many connections in close_wait state?
>>
>>
>>
>> On Tue, Jul 30, 2013 at 5:52 AM, agateaaa <[EMAIL PROTECTED]> wrote:
>>
>> > Looking at the hive metastore server logs see errors like these:
>> >
>> > 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
>> > (TThreadPoolServer.java:run(182)) - Error occurred during processing of
>> > message.
>> > java.lang.NullPointerException
>> >         at
>> >
>> >
>> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
>> >         at
>> >
>> >
>> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
>> >         at
>> >
>> >
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
+
agateaaa 2013-08-29, 18:22
+
Ashutosh Chauhan 2013-08-29, 18:53
+
agateaaa 2013-08-29, 21:39
+
agateaaa 2013-08-29, 21:39
+
agateaaa 2013-07-30, 00:21