Quantcast

[jira] [Created] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Created] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
A federated broker may not reconnect to a remote cluster on link failure.
-------------------------------------------------------------------------

                 Key: QPID-3963
                 URL: https://issues.apache.org/jira/browse/QPID-3963
             Project: Qpid
          Issue Type: Bug
          Components: C++ Broker, C++ Clustering
    Affects Versions: 0.14
            Reporter: Ken Giusti
            Assignee: Ken Giusti


When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.

However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.

Example:
Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.

The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Commented] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259948#comment-13259948 ]

[hidden email] commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/
-----------------------------------------------------------

Review request for qpid, Alan Conway and Gordon Sim.


Summary
-------

Still a WIP, but I wanted early feedback as I'm not too experienced with the subscription management code involved (completely stolen from Alan).

This patch allows the Link to subscribe to the remote broker's amq.failover exchange - if it exists.  This allows the Link to be updated dynamically should the remote broker be part of a cluster, and the cluster membership changes.

Light testing against a cluster confirms that this patch fixes qpid-3963.  Testing against a non-cluster remote causes the remote to log the following error, but otherwise behaves ok:

2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)


This addresses bug qpid-3963.
    https://issues.apache.org/jira/browse/qpid-3963


Diffs
-----

  /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301
  /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301

Diff: https://reviews.apache.org/r/4846/diff


Testing
-------

minimal.


Thanks,

Kenneth


               

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Commented] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260344#comment-13260344 ]

[hidden email] commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/#review7168
-----------------------------------------------------------


Approach seems reasonable to me. Might be worth checking behaviour for push routes as well, as they are a little ugly/odd.


/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15823>

    No, that will be done automatically when the session ends.


- Gordon


On 2012-04-23 20:57:48, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4846/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-23 20:57:48)
bq.  
bq.  
bq.  Review request for qpid, Alan Conway and Gordon Sim.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Still a WIP, but I wanted early feedback as I'm not too experienced with the subscription management code involved (completely stolen from Alan).
bq.  
bq.  This patch allows the Link to subscribe to the remote broker's amq.failover exchange - if it exists.  This allows the Link to be updated dynamically should the remote broker be part of a cluster, and the cluster membership changes.
bq.  
bq.  Light testing against a cluster confirms that this patch fixes qpid-3963.  Testing against a non-cluster remote causes the remote to log the following error, but otherwise behaves ok:
bq.  
bq.  2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)
bq.  
bq.  
bq.  This addresses bug qpid-3963.
bq.      https://issues.apache.org/jira/browse/qpid-3963
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301
bq.  
bq.  Diff: https://reviews.apache.org/r/4846/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  minimal.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


               

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Commented] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260604#comment-13260604 ]

[hidden email] commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/#review7174
-----------------------------------------------------------


Looks good in general, a couple comments - I'm most concerned about the hard-coded channel.


/trunk/qpid/cpp/src/qpid/broker/Link.h
<https://reviews.apache.org/r/4846/#comment15824>

    nit: suggest naming  "failoverExchange" or something similarly descriptive



/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15825>

    why do we want to reserve a channel rather than assign one in the normal course of opening a session? Seems error prone.



/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15826>

    Might be worth merging the addresses to eliminate duplicates. I actually though I had already done in this but it doesn't appear so. I dreamed that I added Url::merge(const Address&)
    Url::merge(const Url&)



/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15827>

    Aside: I'm not sure what "closed by management" means. Do we actually know the link dtor is only called as a result of a management command?



/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15828>

    I don't think you need to do anything here.



/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15829>

    Aside again: this pattern pops up a couple of times now, can we abstract a little "broker subscription toolkit" to simplify this pattern & put the common logic in one place?



/trunk/qpid/cpp/src/qpid/broker/Link.cpp
<https://reviews.apache.org/r/4846/#comment15830>

    I'm definitely queasy about this hard-coded channel number. The session code doesn't know about it and could re-use it.


- Alan


On 2012-04-23 20:57:48, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4846/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-23 20:57:48)
bq.  
bq.  
bq.  Review request for qpid, Alan Conway and Gordon Sim.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Still a WIP, but I wanted early feedback as I'm not too experienced with the subscription management code involved (completely stolen from Alan).
bq.  
bq.  This patch allows the Link to subscribe to the remote broker's amq.failover exchange - if it exists.  This allows the Link to be updated dynamically should the remote broker be part of a cluster, and the cluster membership changes.
bq.  
bq.  Light testing against a cluster confirms that this patch fixes qpid-3963.  Testing against a non-cluster remote causes the remote to log the following error, but otherwise behaves ok:
bq.  
bq.  2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)
bq.  
bq.  
bq.  This addresses bug qpid-3963.
bq.      https://issues.apache.org/jira/browse/qpid-3963
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301
bq.  
bq.  Diff: https://reviews.apache.org/r/4846/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  minimal.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


               

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Commented] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262897#comment-13262897 ]

[hidden email] commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/
-----------------------------------------------------------

(Updated 2012-04-26 19:19:31.447871)


Review request for qpid, Alan Conway and Gordon Sim.


Changes
-------

This patch should be final - I've got more testing to do and that might result in some changes, but consider this my solution for QPID-3963.

I also will be adding a unit test to verify the fix, but that is TBD.

In summary:

1) Each broker Link attempts to subscribe to the amq.failover exchange on the remote
2) The set of failover URLs learned from the remote are replicated on the local cluster when a new member is added.

I've also tried to apply most of the comments from the last review.

Thanks, -K


Summary
-------

Still a WIP, but I wanted early feedback as I'm not too experienced with the subscription management code involved (completely stolen from Alan).

This patch allows the Link to subscribe to the remote broker's amq.failover exchange - if it exists.  This allows the Link to be updated dynamically should the remote broker be part of a cluster, and the cluster membership changes.

Light testing against a cluster confirms that this patch fixes qpid-3963.  Testing against a non-cluster remote causes the remote to log the following error, but otherwise behaves ok:

2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)


This addresses bug qpid-3963.
    https://issues.apache.org/jira/browse/qpid-3963


Diffs (updated)
-----

  /trunk/qpid/cpp/xml/cluster.xml 1329301
  /trunk/qpid/cpp/src/qpid/cluster/Connection.h 1329301
  /trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1329301
  /trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1329301
  /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1329301
  /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301
  /trunk/qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp 1329301
  /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301

Diff: https://reviews.apache.org/r/4846/diff


Testing
-------

minimal.


Thanks,

Kenneth


               

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Commented] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264757#comment-13264757 ]

[hidden email] commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/#review7376
-----------------------------------------------------------

Ship it!



/trunk/qpid/cpp/xml/cluster.xml
<https://reviews.apache.org/r/4846/#comment16280>

    I like this!


- Gordon


On 2012-04-26 19:19:31, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4846/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-26 19:19:31)
bq.  
bq.  
bq.  Review request for qpid, Alan Conway and Gordon Sim.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Still a WIP, but I wanted early feedback as I'm not too experienced with the subscription management code involved (completely stolen from Alan).
bq.  
bq.  This patch allows the Link to subscribe to the remote broker's amq.failover exchange - if it exists.  This allows the Link to be updated dynamically should the remote broker be part of a cluster, and the cluster membership changes.
bq.  
bq.  Light testing against a cluster confirms that this patch fixes qpid-3963.  Testing against a non-cluster remote causes the remote to log the following error, but otherwise behaves ok:
bq.  
bq.  2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)
bq.  
bq.  
bq.  This addresses bug qpid-3963.
bq.      https://issues.apache.org/jira/browse/qpid-3963
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/xml/cluster.xml 1329301
bq.    /trunk/qpid/cpp/src/qpid/cluster/Connection.h 1329301
bq.    /trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301
bq.  
bq.  Diff: https://reviews.apache.org/r/4846/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  minimal.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


               

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Commented] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265793#comment-13265793 ]

[hidden email] commented on QPID-3963:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4846/#review7420
-----------------------------------------------------------

Ship it!


- Alan


On 2012-04-26 19:19:31, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4846/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-26 19:19:31)
bq.  
bq.  
bq.  Review request for qpid, Alan Conway and Gordon Sim.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Still a WIP, but I wanted early feedback as I'm not too experienced with the subscription management code involved (completely stolen from Alan).
bq.  
bq.  This patch allows the Link to subscribe to the remote broker's amq.failover exchange - if it exists.  This allows the Link to be updated dynamically should the remote broker be part of a cluster, and the cluster membership changes.
bq.  
bq.  Light testing against a cluster confirms that this patch fixes qpid-3963.  Testing against a non-cluster remote causes the remote to log the following error, but otherwise behaves ok:
bq.  
bq.  2012-04-23 16:45:27 error Execution exception: not-found: Exchange not found: amq.failover (../../../qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp:101)
bq.  
bq.  
bq.  This addresses bug qpid-3963.
bq.      https://issues.apache.org/jira/browse/qpid-3963
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/xml/cluster.xml 1329301
bq.    /trunk/qpid/cpp/src/qpid/cluster/Connection.h 1329301
bq.    /trunk/qpid/cpp/src/qpid/cluster/Connection.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/cluster/UpdateClient.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/LinkRegistry.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/ExchangeRegistry.cpp 1329301
bq.    /trunk/qpid/cpp/src/qpid/broker/Link.h 1329301
bq.  
bq.  Diff: https://reviews.apache.org/r/4846/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  minimal.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


               

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Resolved] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ken Giusti resolved QPID-3963.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.17
   

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>             Fix For: 0.17
>
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Assigned] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Ross reassigned QPID-3963:
---------------------------------

    Assignee: Justin Ross  (was: Ken Giusti)
   

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Justin Ross
>             Fix For: 0.17
>
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

[jira] [Updated] (QPID-3963) A federated broker may not reconnect to a remote cluster on link failure.

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/QPID-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Ross updated QPID-3963:
------------------------------

    Assignee: Ken Giusti  (was: Justin Ross)
   

> A federated broker may not reconnect to a remote cluster on link failure.
> -------------------------------------------------------------------------
>
>                 Key: QPID-3963
>                 URL: https://issues.apache.org/jira/browse/QPID-3963
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker, C++ Clustering
>    Affects Versions: 0.14
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>             Fix For: 0.17
>
>
> When a broker is federated with a cluster, the cluster informs the broker of the failover addresses that are valid for the cluster.  Should a cluster member fail, the broker will reconnect to another member of that cluster.
> However, the federated broker only queries the cluster for these failover addresses when it first connects to the cluster.  Should the cluster topology change, the federated broker's list of available failover addresses will become out-of-date.  This can prevent the broker from correctly re-connecting on failure of a cluster member.
> Example:
> Given cluster with members C1 and C2, and a separate broker B, federate B to connect to C1.   On connecting to C1, B learns the addresses of C2 as an alternate failover address.  Now shutdown C1.  B will reconnect to C2, and learn that C2 is the only member of the cluster (ie. no failover addresses).   After B connects, restart C1 and let it join the cluster.  Then shutdown C2.   Since B does not know that C1 has become available again, B will not attempt to re-connect to it.  Instead, it tries to reconnect to C2 indefinately.
> The expected behavior would be to have B reconnect to C1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...