git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Can't get ActiveMQ Artemis 2.6.2 shared store failover to work


Hi,

I'm new to ActiveMQ, so I have a couple of conceptual questions as well as
a technical one.

I'd like to set up Artemis in a high availability configuration, so the
queue system as a whole keeps working, even if I disable single machines in
the cluster. I'm familiar with Kafka, which provides this ability via a
Zookeeper quorum, replication and leader elections.

Going by the documentation at
https://activemq.apache.org/artemis/docs/latest/ha.html, I get the
impression that each live server can only have a single backup. Is it
possible to have more than one backup of the data?

My understanding is that to avoid split brain (
https://activemq.apache.org/artemis/docs/latest/network-isolation.html),
I'd need at least 6 servers (3 live, 3 backup). Is this correct, and does
it only apply to replication HA, or also to shared store HA?

I wanted to try out shared store failover behavior, so I set up two brokers
locally using the following commands:

./artemis create --clustered --shared-store --data /mnt/c/artemis-data
--host localhost --http-port 8161 --failover-on-shutdown --default-port
61616 /mnt/c/artemis-master

./artemis create --clustered --shared-store --data /mnt/c/artemis-data
--host localhost --http-port 8162 --failover-on-shutdown --default-port
61617 --slave /mnt/c/artemis-slave

I can't get the backup to take over when doing this. The log is spammed
with the following message in both brokers:

2018-08-01 11:48:33,897 WARN  [org.apache.activemq.artemis.core.client]
AMQ212034: There are more than one servers on the network broadcasting the
same node id. You will see this message exactly once (per node) if a node
is restarted, in which case it can be safely ignored. But if it is logged
continuously it means you really do have more than one node on the same
network active concurrently with the same node id. This could occur if you
have a backup node active at the same time as its live node.
nodeID=cb201578-9580-11e8-b925-f01faf531f94

Just for the sake of completeness, I tried replacing the
broadcast/discovery group in broker.xml with static-connector
configuration, and this gets rid of this warning but the backup still won't
take over for the master when I kill the master process. The backup broker
clearly logs that it has lost connection to another server in the cluster,
but it doesn't seem to take the live role.

Am I misunderstanding how the failover should work, or is there something
wrong with the configuration?