Unexpected error from qpidd-cpp 1.39 when queue is faulty

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected error from qpidd-cpp 1.39 when queue is faulty

Pål Skjager Løberg
Hi.

today I experienced that when sending messages to the broker sending failed
with the response "illegal-argument: Value for replyText is too
large(320)". In a thread from November 2018 (named qpid-cpp-0.35 errors) I
see other users getting the same error message when unexpected
things happen in the broker.

In my case this happened due to an inproper shutdown causing the queue to
be "operational", but not possible to write dusable messages to. In the
syslog I found error like:
2019-05-14 09:04:28 [Broker] warning Exchange some.topic cannot deliver to
queue 440d04db-7fb6-3424-a83c-b70014fa32a0: Queue
440d04db-7fb6-3424-a83c-b70014fa32a0: MessageStoreImpl::store() failed:
jexception 0x010b LinearFileController::getCurrentSerial() threw
JERR__NULL: Operation on null pointer
(/var/tmp/portage/net-misc/qpid-cpp-1.39.0/work/qpid-cpp-1.39.0/src/qpid/linearstore/MessageStoreImpl.cpp:1226)
2019-05-14 09:04:28 [Broker] error Connection exception: framing-error:
Queue 440d04db-7fb6-3424-a83c-b70014fa32a0: MessageStoreImpl::store()
failed: jexception 0x010b LinearFileController::getCurrentSerial() threw
JERR__NULL: Operation on null pointer
(/var/tmp/portage/net-misc/qpid-cpp-1.39.0/work/qpid-cpp-1.39.0/src/qpid/linearstore/MessageStoreImpl.cpp:1226)
2019-05-14 09:04:28 [Protocol] error Connection
qpid.127.0.0.1:5672-127.0.0.1:57954 closed by error: Queue
440d04db-7fb6-3424-a83c-b70014fa32a0: MessageStoreImpl::store() failed:
jexception 0x010b LinearFileController::getCurrentSerial() threw
JERR__NULL: Operation on null pointer
(/var/tmp/portage/net-misc/qpid-cpp-1.39.0/work/qpid-cpp-1.39.0/src/qpid/linearstore/MessageStoreImpl.cpp:1226)(501)
2019-05-14 09:04:28 [Protocol] error Connection
qpid.127.0.0.1:5672-127.0.0.1:57954 closed by error: illegal-argument:
Value for replyText is too large(320)

For a client, just getting "illegal-argument: Value for replyText is too
large" back as an error when sending is not the most useful info and I
suspect, especially after reading the mentioned thread from November, there
might be a bug in how the error responses to the client is generated
causing the actual error to be masked by another error.

Also, there seems to be a possibility that the Qpid broker will start wth
broken queues, causing it to fail only when messages are written to that
queue, including some null pointer problems.

Are any of these known issues or is it the expected behavior?

--
Thanks,

 -- Paul
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected error from qpidd-cpp 1.39 when queue is faulty

Gordon Sim
On 14/05/2019 10:46 am, Pål Skjager Løberg wrote:

> For a client, just getting "illegal-argument: Value for replyText is too
> large" back as an error when sending is not the most useful info and I
> suspect, especially after reading the mentioned thread from November, there
> might be a bug in how the error responses to the client is generated
> causing the actual error to be masked by another error.
>
> Also, there seems to be a possibility that the Qpid broker will start wth
> broken queues, causing it to fail only when messages are written to that
> queue, including some null pointer problems.
>
> Are any of these known issues or is it the expected behavior?

No, neither of these is the correct behaviour.

I have committed a fix for the first issue:
https://issues.apache.org/jira/browse/QPID-8313

For the issue with the journal recovery, I'd need to defer to the
expert. Kim, can you recommend any diagnostics to figure out what would
cause the problems in the queues on recovery? i.e. errors such as:

> jexception 0x010b LinearFileController::getCurrentSerial() threw
> JERR__NULL: Operation on null pointer




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected error from qpidd-cpp 1.39 when queue is faulty

Justin Ross-3
Kim?

On Tue, May 14, 2019, 14:01 Gordon Sim <[hidden email]> wrote:

> On 14/05/2019 10:46 am, Pål Skjager Løberg wrote:
> > For a client, just getting "illegal-argument: Value for replyText is too
> > large" back as an error when sending is not the most useful info and I
> > suspect, especially after reading the mentioned thread from November,
> there
> > might be a bug in how the error responses to the client is generated
> > causing the actual error to be masked by another error.
> >
> > Also, there seems to be a possibility that the Qpid broker will start wth
> > broken queues, causing it to fail only when messages are written to that
> > queue, including some null pointer problems.
> >
> > Are any of these known issues or is it the expected behavior?
>
> No, neither of these is the correct behaviour.
>
> I have committed a fix for the first issue:
> https://issues.apache.org/jira/browse/QPID-8313
>
> For the issue with the journal recovery, I'd need to defer to the
> expert. Kim, can you recommend any diagnostics to figure out what would
> cause the problems in the queues on recovery? i.e. errors such as:
>
> > jexception 0x010b LinearFileController::getCurrentSerial() threw
> > JERR__NULL: Operation on null pointer
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected error from qpidd-cpp 1.39 when queue is faulty

Pavel Moravec
Not being such an expert to linearstore as Kim, I have two ideas:

1) in case you have thousands of durable queues, you can hit kernel's limit
on AIO operations and need to increase fs.aio-max-nr parameter. For
calculation: I recall on some systems (rhel6?) one durable queue required
33 AIO handlers, on rhel7 it seems less (half?), but take this as a rule of
thumb only.

2) It seems the journal file handler has not been initialized as it is null
pointer. That could be consequence of the improper shutdown (though a buggy
one). If you don't care about the data in the queue, you can replace the
jrnl file(s) by empty one (I can share the file). But I expect you would
like to get the data - then I would start with enabling trace logs via
adding

log-enable=trace+:linearstore
log-to-file=/path/to/file.log  # if not already logging somewhere, i.e.
syslog (with trace logs not dropped)

and observing how journal recovery happened on all jrnl files (or symlinks
to them) under

/var/lib/qpidd/.qpidd/qls/jrnl2/440d04db-7fb6-3424-a83c-b70014fa32a0

directory (here I deduce the uuid is a real queue name, per your error
logs).

I expect one jrnl file (the most current) recovery would fail in some
manner.


Kind regards,
Pavel


On Thu, May 30, 2019 at 11:57 PM Justin Ross <[hidden email]> wrote:

> Kim?
>
> On Tue, May 14, 2019, 14:01 Gordon Sim <[hidden email]> wrote:
>
> > On 14/05/2019 10:46 am, Pål Skjager Løberg wrote:
> > > For a client, just getting "illegal-argument: Value for replyText is
> too
> > > large" back as an error when sending is not the most useful info and I
> > > suspect, especially after reading the mentioned thread from November,
> > there
> > > might be a bug in how the error responses to the client is generated
> > > causing the actual error to be masked by another error.
> > >
> > > Also, there seems to be a possibility that the Qpid broker will start
> wth
> > > broken queues, causing it to fail only when messages are written to
> that
> > > queue, including some null pointer problems.
> > >
> > > Are any of these known issues or is it the expected behavior?
> >
> > No, neither of these is the correct behaviour.
> >
> > I have committed a fix for the first issue:
> > https://issues.apache.org/jira/browse/QPID-8313
> >
> > For the issue with the journal recovery, I'd need to defer to the
> > expert. Kim, can you recommend any diagnostics to figure out what would
> > cause the problems in the queues on recovery? i.e. errors such as:
> >
> > > jexception 0x010b LinearFileController::getCurrentSerial() threw
> > > JERR__NULL: Operation on null pointer
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>