git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [artemis 2.1.0] taking 30+ minutes to boot & failover


On Fri, Jul 20, 2018 at 4:47 PM, Dan Langford <danlangford@xxxxxxxxx> wrote:
> Thank you that was very helpful. we actually do have an address settings
> entry for each queue. there could be a better pattern for us. but currently
> our automated system for creating queues creates an address setting at the
> same time. i will look into improved patterns.
>
> as far as upgrading goes. i agree we really want to upgrade. until i can
> find a work around for the config-delete-queues deserialization bug
> introduced in 2.2.0 i brought up back in April we will not be able to
> easily move.

You should be able to move to 2.6.2. if you're not able to I would
like to know where it failed.


>
> thanks again for all the help
>
> On Fri, Jul 20, 2018 at 8:56 AM Clebert Suconic <clebert.suconic@xxxxxxxxx>
> wrote:
>
>> If you do not want to upgrade for any reason export the journal. Cleani
>> uo.  Edit the text and remove the garbage (you will see) manually.   Delete
>> all data and te import.
>>
>> (Make a backup to be safe of course)
>>
>>
>> But I still recommend the upgrade.
>>
>> On Fri, Jul 20, 2018 at 10:54 AM Clebert Suconic <
>> clebert.suconic@xxxxxxxxx>
>> wrote:
>>
>> > The address setting is the garbage I was talking about.  Upgrade to the
>> > latest broker and there will be a cleanup done at the load before it
>> > starts.
>> >
>> >
>> > I highly recommend upgrade.
>> >
>> > On Fri, Jul 20, 2018 at 10:05 AM Justin Bertram <jbertram@xxxxxxxxxx>
>> > wrote:
>> >
>> >> Analyzing thread dumps like this is pretty simple.  I generally just
>> >> scroll
>> >> through and look for long stack-traces with lots of calls from
>> >> org.apache.activemq.artemis.  In your case every single thread dump has
>> a
>> >> thread doing something like this:
>> >>
>> >> "main" #1 prio=5 os_prio=0 tid=0x00007f902800eb20 nid=0x74fe runnable
>> >> [0x00007f9031675000]
>> >>    java.lang.Thread.State: RUNNABLE
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository.getPossibleMatches(HierarchicalObjectRepository.java:373)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository.getMatch(HierarchicalObjectRepository.java:192)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.paging.impl.PagingManagerImpl.reapplySettings(PagingManagerImpl.java:113)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.paging.impl.PagingManagerImpl.onChange(PagingManagerImpl.java:108)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository.onChange(HierarchicalObjectRepository.java:348)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository.addMatch(HierarchicalObjectRepository.java:168)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository.addMatch(HierarchicalObjectRepository.java:147)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.settings.impl.HierarchicalObjectRepository.addMatch(HierarchicalObjectRepository.java:120)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.recoverStoredConfigs(ActiveMQServerImpl.java:2424)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:2374)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:2219)
>> >>     - locked <0x0000000080a8fce8> (a
>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.run(SharedNothingLiveActivation.java:109)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:518)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:466)
>> >>     - locked <0x0000000080a8fce8> (a
>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl)
>> >>     at
>> >>
>> >>
>> org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:111)
>> >>     - locked <0x000000008098be80> (a
>> >> org.apache.activemq.artemis.integration.FileBroker)
>> >>     at org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:73)
>> >>     at
>> >>
>> org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:148)
>> >>     at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:95)
>> >>     at org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:122)
>> >>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>     at
>> >>
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >>     at
>> >>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>     at java.lang.reflect.Method.invoke(Method.java:498)
>> >>     at
>> org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:129)
>> >>     at org.apache.activemq.artemis.boot.Artemis.main(Artemis.java:49)
>> >>
>> >> In every case it's the "main" thread which isn't surprising as that is
>> the
>> >> thread responsible for starting the broker.  Also, you can pretty
>> clearly
>> >> see in the trace that this is the thread starting the broker, it's
>> loading
>> >> the journals, & restoring stored configuration (either address settings
>> or
>> >> security settings).  I've seen high broker start times when there are
>> lots
>> >> and lots of addresses and lots of and lots of settings.  Do either (or
>> >> both) of these situations apply to you?
>> >>
>> >>
>> >> Justin
>> >>
>> >> On Fri, Jul 20, 2018 at 1:16 AM, Dan Langford <danlangford@xxxxxxxxx>
>> >> wrote:
>> >>
>> >> > Dang I can’t easily upgrade past 2.1.0 because of the
>> >> config-delete-queues
>> >> > deserialization bug introduced in 2.2.0. Unless that bug was squashed
>> in
>> >> > 2.6+. I don’t think I made a jira for it (vacation and work load) but
>> we
>> >> > discussed it back in April. I should go confirm that bug on 2.6 and
>> >> make a
>> >> > jira for that
>> >> >
>> >> > Thanks
>> >> > On Thu, Jul 19, 2018 at 5:46 PM Clebert Suconic <
>> >> clebert.suconic@xxxxxxxxx
>> >> > >
>> >> > wrote:
>> >> >
>> >> > > There is an issue I remember where the journal would have some dirt
>> >> that
>> >> > > was fixed on 2.3/0.
>> >> > >
>> >> > > I would ipgrade to 2.6.2.
>> >> > >
>> >> > > On Thu, Jul 19, 2018 at 6:34 PM Dan Langford <danlangford@xxxxxxxxx
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > would you be willing to help me translate these thread dumps?
>> >> > > >
>> >> > > > i attached a Zip file with some thread dumps in them. i will also
>> >> share
>> >> > > > the fasthread.io links for each file. (i was struggling getting
>> >> > > > fastthread to do a combo report with the threads in the correct
>> >> order)
>> >> > > >
>> >> > > > artemis04-20180719-1525 https://goo.gl/d88azU
>> >> > > > artemis04-20180719-1530 https://goo.gl/G78qn3
>> >> > > > artemis04-20180719-1535 https://goo.gl/aMBSBw
>> >> > > > artemis04-20180719-1540 https://goo.gl/brKxxk
>> >> > > > artemis04-20180719-1545 https://goo.gl/RaXXCs
>> >> > > > artemis04-20180719-1550 https://goo.gl/r5dndK
>> >> > > > artemis04-20180719-1555 https://goo.gl/YJRLxe
>> >> > > >
>> >> > > > at :35, :45, and :55 the young+old gen space gets bigger than at
>> the
>> >> > > other
>> >> > > > sample times. but i dont know what to look for in here to
>> determine
>> >> > what
>> >> > > > the broker is actually during during this time.
>> >> > > >
>> >> > > > thanks
>> >> > > >
>> >> > > > On Fri, Jul 6, 2018 at 10:42 AM Justin Bertram <
>> jbertram@xxxxxxxxxx
>> >> >
>> >> > > > wrote:
>> >> > > >
>> >> > > >> The first place I would start is grabbing thread dumps every so
>> >> often
>> >> > to
>> >> > > >> see what the broker is actually doing during the 30+ minutes.
>> >> > > >>
>> >> > > >>
>> >> > > >> Justin
>> >> > > >>
>> >> > > >> On Fri, Jul 6, 2018 at 11:34 AM, Dan Langford <
>> >> danlangford@xxxxxxxxx>
>> >> > > >> wrote:
>> >> > > >>
>> >> > > >> > so my server startup times and failover times are growing
>> pretty
>> >> > big.
>> >> > > >> but i
>> >> > > >> > dont really know where to start looking.
>> >> > > >> >
>> >> > > >> > here is a snippet of some logs to show you the time stamps:
>> >> > > >> >
>> >> > > >> > 08:11:31,801 INFO  [org.apache.activemq.artemis.
>> >> > integration.bootstrap]
>> >> > > >> > AMQ101000: Starting ActiveMQ Artemis Server
>> >> > > >> >
>> >> > > >> > 08:11:31,818 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221000:
>> >> > > >> > live Message Broker is starting with configuration Broker
>> >> > > Configuration
>> >> > > >> > (clustered=true,journalDirectory=./data/
>> >> > journal,bindingsDirectory=./
>> >> > > >> > data/bindings,largeMessagesDirectory=./data/large-messages,
>> >> > > >> > pagingDirectory=./data/paging)
>> >> > > >> >
>> >> > > >> > 08:11:34,462 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221012:
>> >> > > >> > Using AIO Journal
>> >> > > >> >
>> >> > > >> > 08:11:34,493 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221057:
>> >> > > >> > Global Max Size is being adjusted to 1/2 of the JVM max size
>> >> (-Xmx).
>> >> > > >> being
>> >> > > >> > defined as 1,073,741,824
>> >> > > >> >
>> >> > > >> > 08:11:34,555 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221043:
>> >> > > >> > Protocol module found: [artemis-server]. Adding protocol
>> support
>> >> > for:
>> >> > > >> CORE
>> >> > > >> >
>> >> > > >> > 08:11:34,555 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221043:
>> >> > > >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol
>> >> > > support
>> >> > > >> > for: AMQP
>> >> > > >> >
>> >> > > >> > 08:11:34,556 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221043:
>> >> > > >> > Protocol module found: [artemis-hornetq-protocol]. Adding
>> >> protocol
>> >> > > >> support
>> >> > > >> > for: HORNETQ
>> >> > > >> >
>> >> > > >> > 08:11:34,556 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221043:
>> >> > > >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol
>> >> > > support
>> >> > > >> > for: MQTT
>> >> > > >> >
>> >> > > >> > 08:11:34,556 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221043:
>> >> > > >> > Protocol module found: [artemis-openwire-protocol]. Adding
>> >> protocol
>> >> > > >> support
>> >> > > >> > for: OPENWIRE
>> >> > > >> >
>> >> > > >> > 08:11:34,556 INFO  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ221043:
>> >> > > >> > Protocol module found: [artemis-stomp-protocol]. Adding
>> protocol
>> >> > > support
>> >> > > >> > for: STOMP
>> >> > > >> >
>> >> > > >> > 08:41:38,963 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222165:
>> >> > > >> No
>> >> > > >> > Dead Letter Address configured for queue DLQ in AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,963 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222166:
>> >> > > >> No
>> >> > > >> > Expiry Address configured for queue DLQ in AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,983 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222165:
>> >> > > >> No
>> >> > > >> > Dead Letter Address configured for queue ExpiryQueue in
>> >> > > AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,983 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222166:
>> >> > > >> No
>> >> > > >> > Expiry Address configured for queue ExpiryQueue in
>> >> AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,984 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222165:
>> >> > > >> No
>> >> > > >> > Dead Letter Address configured for queue example in
>> >> AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,985 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222166:
>> >> > > >> No
>> >> > > >> > Expiry Address configured for queue example in AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,985 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222165:
>> >> > > >> No
>> >> > > >> > Dead Letter Address configured for queue exampleQueue in
>> >> > > AddressSettings
>> >> > > >> >
>> >> > > >> > 08:41:38,986 WARN  [org.apache.activemq.artemis.core.server]
>> >> > > AMQ222166:
>> >> > > >> No
>> >> > > >> > Expiry Address configured for queue exampleQueue in
>> >> AddressSettings
>> >> > > >> >
>> >> > > >> >
>> >> > > >> > and it continues. i have 138 queues. i wonder if i need to be
>> >> > looking
>> >> > > at
>> >> > > >> > PAGE configuration or some cache sizes. do i need to be looking
>> >> at
>> >> > the
>> >> > > >> > number of messages persisted on these queues? where would you
>> >> look
>> >> > to
>> >> > > >> > determine why the startup times are so long?
>> >> > > >> >
>> >> > > >> > also, i know i need to upgrade but i cannot upgrade off of
>> 2.1.0
>> >> due
>> >> > > to
>> >> > > >> > some deseralization changes introduced in 2.2.0. i think i have
>> >> > > another
>> >> > > >> > thread on here (that i need to update) regarding that upgrade
>> >> issue.
>> >> > > >> >
>> >> > > >> > thanks for any ideas or insight you have for me
>> >> > > >> >
>> >> > > >>
>> >> > > > --
>> >> > > Clebert Suconic
>> >> > >
>> >> >
>> >>
>> > --
>> > Clebert Suconic
>> >
>> --
>> Clebert Suconic
>>



-- 
Clebert Suconic