git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Artemis 2.6.2] AMQP messages going to expiry queue, redelivery off that queue is failing


Thank you guys so much!
On Tue, Sep 11, 2018 at 2:50 PM Timothy Bish <tabish121@xxxxxxxxx> wrote:

> On 08/30/2018 07:10 PM, Dan Langford wrote:
> > thanks for looking into this. what is the proper way to force for
> testing a
> > redelivery that goes back to the broker without transactions? its
> probably
> > like killing the session or connection. that would be if we wanted to
> test
> > if non-transacted redeliveries were getting corrupted
>
> The issue has been identified and fixed, should appear in the next release.
> https://issues.apache.org/jira/browse/ARTEMIS-2082
>
> >
> > we have a lot of people using spring-jms and as i look in the spring
> code...
> >
> https://github.com/spring-projects/spring-framework/blob/master/spring-jms/src/main/java/org/springframework/jms/listener/AbstractMessageListenerContainer.java
> > looking at doExecuteListener and rollbackOnExceptionIfNecessary
> > it looks like if its transacted they try to rollback() and if its
> > client_ack they try recover(). they probably dont handle auto_ack because
> > that was acked immediately?
> >
> > speaking of this what is an appropriate way to NACK a message when using
> > Client Acknowledgemnt and a JMS messageHandler? is it simply to reach the
> > end of the method execution without having called message.acknowledge()
> or
> > would it be appropriate to throw a RuntimeException (since i cannot
> throw a
> > checked Exception out of an implementation of javax.jms.MessageListener)
> ?
> >
> > On Thu, Aug 30, 2018 at 8:38 AM Robbie Gemmell <robbie.gemmell@xxxxxxxxx
> >
> > wrote:
> >
> >> Tim and I had an initial look at this, and can see generally where the
> >> broker is internally corrupting things on send, though not yet the
> >> full picture how it gets there or what to do about it. The expiration
> >> is likely to be key, one difference with the non-transacted case is
> >> actually going to be because its using recover() which the client
> >> performs locally.
> >>
> >> Robbie
> >>
> >> On Wed, 29 Aug 2018 at 06:23, Dan Langford <danlangford@xxxxxxxxx>
> wrote:
> >>> ok i wrote 3 test files. I don't know the best way to get them to you
> >>> easily. hopefully a Gist is ok.
> >>> https://gist.github.com/danlangford/071e738225ec0c68dd470816b977499b
> >>>
> >>> you can copy those 3 files straight to
> >>>
> >>
> ./tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >>> The test JMSTransactedRedeliveryTest::testAMQPProducerAMQPConsumer
> proves
> >>> that a transacted client can .rollback() a handful of times and still
> be
> >>> able to consume the redelivered message later on.
> >>>
> >>> The test JMSTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> >> shows
> >>> that if a message had been expired and now a transacted client is
> >>> attempting to consume it the client only has 2 chances before the
> broker
> >>> starts sending the message in a way that will not parse correctly
> >>>
> >>> The test
> JMSNonTransactedRedeliveryBugTest::testAMQPProducerAMQPConsumer
> >> shows
> >>> that if a message had been expired a non-transacted client has no
> >> troubles
> >>> reliably accessing the redelivered message from broker
> >>>
> >>>
> >>> as you can tell i am mostly concerned about AMQP->AMQP for my use case.
> >>> some of those other combos are failing some of these tests in other
> ways.
> >>> naturally you can address those as you see fit but for my client the
> >>> AMQP->AMQP is a roadblocker.
> >>>
> >>>
> >>> let me know if you can determine why the broker is sending an extra
> null
> >>> character in the payload on the third time the messages attempts
> >> delivery.
> >>> maybe we are doing something incorrectly.
> >>>
> >>>
> >>> This has been more of an issue than i thought due to the fact that
> Spring
> >>> default to enabling transactions. in all of my initial tests i couldn't
> >>> reproduce it because i prefer the straight simplified jms api from 2.0
> >> and
> >>> that defaults to sessions not being transacted. that being said nearly
> >> all
> >>> of my clients prefer using Spring Boot autoconfigurer and other spring
> >>> pieces which happen to default to transacted sessions.  i can now have
> >> some
> >>> of them workaround but others of them are requiring the transaction.
> >>>
> >>>
> >>> also as a reminder and for context here is a link to the initial
> >>> conversation i had with the Qpid Jms Client devs who pointed out to me
> >> the
> >>> erroneous null character in the message transfer from the broker:
> >>> https://lists.apache.org/
> >>> thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%
> >> 3Cusers.qpid.apache.org%3E
> >>>
> >>> thank you so much for your time
> >>>
> >>> On Tue, Aug 14, 2018 at 1:19 PM Timothy Bish <tabish121@xxxxxxxxx>
> >> wrote:
> >>>> On 08/13/2018 07:12 PM, Dan Langford wrote:
> >>>>> some of my users are attempting a pattern to deduplicate messages
> >> based
> >>>> on
> >>>>> a time window instead of a fixed amount of space (a duplicate id
> >> cache)
> >>>>> so far the concept has been working very well. So they send their
> >> AMQP
> >>>>> messages (qpid-jms-client) into a Last Value Queue with an
> >> appropriate
> >>>>> identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the
> >>>> message
> >>>>> that is essentially the lag they will allow as they want to wait for
> >>>>> possible duplicates. If any duplicates come in the Last Value Queue
> >>>>> behavior is replacing the older message with the newer message until
> >> the
> >>>>> expiration. expired messages are delivered to the preconfigured
> >> expiry
> >>>>> queue where their application is listening. This is not perfect but
> >> its
> >>>> not
> >>>>> intended to be. Its just intended to reduce additional unnecessary
> >>>>> processing and they understand this is not a guarantee. It really
> >> helps
> >>>>> with a system that produces messages in a way that has flurries of
> >>>>> "notifications" about the same assetID over and over again.
> >>>>>
> >>>>> BUT where we are seeing is a problem is when we are consuming from
> >> the
> >>>>> queue used to hold expired messages and we toss some exception and
> >> the
> >>>>> message needs to be redelivered. the first time or two the message is
> >>>>> redelivered it is delivered OK. But when the JMSXDeliveryCount is
> >> about 3
> >>>>> or 4 (we use redelivery delay and multipliers to spread these out)
> >> our
> >>>>> qpid-jms-client stops being able to read the messages.
> >>>>>
> >>>>> we were only able to reproduce this when an AMQP message expired
> >> onto the
> >>>>> queue. (expired from a LVQ in case that is relevant). if we place the
> >>>>> message directly on a queue and test different exception and
> >> redelivery
> >>>>> scenarios we cannot reproduce this behavior.
> >>>>>
> >>>>> i enable the qpid-jms-client frame logging (via env variable
> >>>>> PN_TRACE_FRM=true) and i saw that in the situation when the client
> >> code
> >>>>> cannot access the payload, even though the broker WAS still sending
> >> the
> >>>>> payload. so i thought it was some odd issue with the client. The
> >> Apache
> >>>>> Qpid team responded that the issue seems to be that the broker
> >> starts to
> >>>>> send some ill formed payloads in this scenario. i dont want to
> >> repeat the
> >>>>> stack traces and their response, you can read those here
> >>>>>
> >>>>>
> >>
> https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E
> >>>>> would it be helpful if i tested that this happens if there is not a
> >> LVQ
> >>>>> involved? i could have a message in a non-LVQ expire to another
> >> queue and
> >>>>> see if redeliveries over their get messed up after a few attempts.
> >> For
> >>>> the
> >>>>> record this is AMQP for producing and consuming. i do notice the
> >> messages
> >>>>> waiting in the expiry queue have much more headers messages sent
> >> directly
> >>>>> to a queue from client code. they seem to be headers full of
> >> information
> >>>>> about the message as it left the previous queue. I tried to send a
> >>>> message
> >>>>> directly to the expiry queue with all these headers to determine if
> >> it
> >>>> was
> >>>>> the existence of one of these specifically that trigger the malformed
> >>>> frame
> >>>>> but was not able to fully set all those headers. the JMSDeliverCount
> >>>> (type
> >>>>> Long) was the one that the client would not let me set and as a
> >> result i
> >>>>> could not test. for clarity thought i dont know that the issue
> >> exists due
> >>>>> to a header that is just what i saw as a difference between messages
> >> be
> >>>>> delivered to the queue by client code versus messages expiring from
> >> one
> >>>>> queue to another.
> >>>>>
> >>>>> please look over the linked thread on the qpid list and let me know
> >> if
> >>>> you
> >>>>> know why a message transfer fram would become malformed after a few
> >>>> failed
> >>>>> deliveries only if the message expired onto the current queue.
> >>>>>
> >>>>> thanks so much
> >>>>>
> >>>> A great place to start is to create a unit test that reproduces the
> >>>> issue.  You can look at the Artemis unit tests for AMQP to get some
> >>>> inspiration on how to set one up.  Then try and create the smallest
> >>>> possible test that can reproduce the issue to make it easier to narrow
> >>>> in on where the issue might be.
> >>>>
> >>>> The AMQP tests in Artemis are located here:
> >>>>
> >>>>
> >>>>
> >>
> https://github.com/apache/activemq-artemis/tree/master/tests/integration-tests/src/test/java/org/apache/activemq/artemis/tests/integration/amqp
> >>>>
> >>>>
> >>>> --
> >>>> Tim Bish
> >>>>
> >>>>
>
> --
> Tim Bish
> twitter: @tabish121
> blog: http://timbish.blogspot.com/
>
>