Open connection to broker waits forever

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Open connection to broker waits forever

cwan
This post was updated on .
Greetings.

I have a problem with qpid clients stuck on the open connection call while
reconnecting with the broker.

Our system uses qpid-cpp 1.38 c++ client/broker and AMQP 0-10 on RHEL 7.5.  
Once in a while, the network connections between the clients and the broker
break, and when the clients reconnect, some of them are blocked because the
open connection call never returns.  

Based on the stack trace (see below), it looks like the qpid client is
waiting for a connection to open, but the connection is never established.  

Stack Trace:
Thread 1 (Thread 0x7f8979af28c0 (LWP 36968)):
#0  0x00007f8977c7f995 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00007f89785be613 in qpid::sys::Condition::wait(qpid::sys::Mutex&) ()
from /lib64/libqpidclient.so.2
#2  0x00007f89785ea683 in qpid::client::StateManager::waitFor(std::set<int,
std::less&lt;int>, std::allocator<int> >) () from /lib64/libqpidclient.so.2
#3  0x00007f89785c226f in qpid::client::ConnectionHandler::waitForOpen() ()
from /lib64/libqpidclient.so.2
#4  0x00007f89785c808a in qpid::client::ConnectionImpl::open() () from
/lib64/libqpidclient.so.2
#5  0x00007f89785bfb68 in
qpid::client::Connection::open(qpid::client::ConnectionSettings const&) ()
from /lib64/libqpidclient.so.2
#6  0x00007f89785c01ed in qpid::client::Connection::open(qpid::Url const&,
qpid::client::ConnectionSettings const&) () from /lib64/libqpidclient.so.2
#7  0x00007f89796a119f in
qpid::client::amqp0_10::ConnectionImpl::tryConnect() () from
/lib64/libqpidmessaging.so.2
#8  0x00007f89796a28f4 in
qpid::client::amqp0_10::ConnectionImpl::connect(qpid::sys::AbsTime const&)
() from /lib64/libqpidmessaging.so.2
#9  0x00007f89796a3c93 in qpid::client::amqp0_10::ConnectionImpl::open() ()
from /lib64/libqpidmessaging.so.2
#10 0x00007f89796c44e4 in qpid::messaging::Connection::open() () from
/lib64/libqpidmessaging.so.2

I have seen two scenarios that the open connection call is stuck:
* SSL forcehandshake never completes
* After epoll error like this:
2018-07-19 15:36:35 [System] error Caught exception in state: 1 with event:
4: No such file or directory
(/builddir/build/BUILD/qpid-cpp-1.38.0/src/qpid/sys/epoll/EpollPoller.cpp:357)
 2018-07-19 15:36:35 [Security] warning Connect failed: Connection refused                                                                  
 Failed to connect (reconnect disabled)

It is a rare event, sometimes takes weeks or months to happen.  But when it
occurs, we have to manually restart the client process in order to
re-establish the broker connection.

I am seeking guidance to address this problem.  
I have two ideas so far:
1.  Instead of calling qpid::client::StateManager::waitFor(std::set<int>
desired), call qpid::client::StateManager::waitFor(std::set<int> desired,
qpid::sys::Duration timeout).   If I understand it correctly, a timeout
would ensure the open connect call returns eventually.  But I am not sure if
this would break other functionalities
2. Build a monitor in my code, and after some time if the qpid open
connection call doesn't return, forcibly kill the connection threads and
reconnect again... (this seems like a less desirable option)

Btw, we don't use qpid client's auto-reconnect because some custom clean up
is required after a disconnect.  This is the setting used:
{\"transport\":\"ssl\",\"heartbeat\":10,\"reconnect\":false,\"tcp_nodelay\":true}
The software work flow is like this:
1. On disconnect, destroy the connection object
2. Create a new connection
3. Call Connection::open
4-1. A connection is opened, create session, sender and receiver
4-2. No connection is established, repeat from #1

Regards,

Chen Wan



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Reply | Threaded
Open this post in threaded view
|

Re: [qpid-cpp client] Open connection to broker waits forever - patch attached

cwan
qpid_client_connect_timeout.patch
<http://qpid.2158936.n2.nabble.com/file/t396381/qpid_client_connect_timeout.patch>  

Attached is a patch to add connect timeout to qpid-cpp client's open
connection call.
A new option called "connect-timeout" can be used to specify how long to
wait (in seconds) for the connection::open call.  When the "connect-timeout"
is set, qpid-cpp client calls waitFor with a timeout.

I have tested it in my environment, and it seems to be able to time out
properly when the open call gets stucked.

The main part of patch is this:
diff --git a/src/qpid/client/ConnectionHandler.cpp
b/src/qpid/client/ConnectionHandler.cpp
index 4f044c2f3..77d43f191 100644
--- a/src/qpid/client/ConnectionHandler.cpp
+++ b/src/qpid/client/ConnectionHandler.cpp
@@ -148,16 +148,7 @@ void ConnectionHandler::outgoing(AMQFrame& frame)
 
 void ConnectionHandler::waitForOpen()
 {
-    if (ConnectionSettings::connectTimeout) {
-        if (!waitFor(ESTABLISHED,
qpid::sys::Duration(ConnectionSettings::connectTimeout *
qpid::sys::TIME_SEC))) {
-            errorText = "Connection open timed out";
-            QPID_LOG(warning, errorText);
-            setState(FAILED);
-        }
-    } else {
-        waitFor(ESTABLISHED);//ESTABLISHED = OPEN, CLOSED or FAILED
-    }
-




--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [qpid-cpp client] Open connection to broker waits forever - patch attached

Gordon Sim
On 24/07/18 15:27, cwan wrote:
> qpid_client_connect_timeout.patch
> <http://qpid.2158936.n2.nabble.com/file/t396381/qpid_client_connect_timeout.patch>
>
> Attached is a patch to add connect timeout to qpid-cpp client's open
> connection call.
> A new option called "connect-timeout" can be used to specify how long to
> wait (in seconds) for the connection::open call.  When the "connect-timeout"
> is set, qpid-cpp client calls waitFor with a timeout.

Your patch seems to be reversed. The best thing is to open a JIRA for it
and attach the patch to that. It looks ok to me though.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [qpid-cpp client] Open connection to broker waits forever - patch attached

cwan
Hi Gordon,

You're right. My apologies for messing up the patch.
I have created https://issues.apache.org/jira/browse/QPID-8221, and attached
the correct patch to that issue.

Thanks,

Chen



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]