git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can't get ActiveMQ Artemis 2.6.2 shared store failover to work


> Is it possible to have more than one backup of the data?

It is not possible to have more that one backup of the data managed by the
broker.  You are, of course, free to use technology to replicate the data
underneath the broker (e.g. replicated filesystem).

> My understanding is that to avoid split brain (
https://activemq.apache.org/artemis/docs/latest/network-isolation.html),
I'd need at least 6 servers (3 live, 3 backup). Is this correct, and does
it only apply to replication HA, or also to shared store HA?

To mitigate against the chance of split brain it is recommended to have an
odd number of live brokers in the cluster so a majority is easy to
establish.  The smallest odd number larger than 1 is 3.  Hence the
recommendation for 3 live/backup pairs.  Keep in mind that these can be
colocated to avoid wasting resources.

Split brain is only a problem in the case of replication.  In the
shared-store use-case the shared storage itself mitigates against
split-brain.

> Am I misunderstanding how the failover should work, or is there something
wrong with the configuration?

I can think of two options here off the top of my head:

  1) The shared storage doesn't properly implement file locks.  Can you
elaborate on what "/mnt/c/artemis-data" is?  Is it NFS or some other kind
of NAS?
  2) There is a bug in the way the "artemis create" generates the
configuration.  Could you paste (or pastebin) the configuration from both
the live and the backup?


Justin


On Wed, Aug 1, 2018 at 7:57 AM, Stig Rohde Døssing <stigdoessing@xxxxxxxxx>
wrote:

> Hi,
>
> I'm new to ActiveMQ, so I have a couple of conceptual questions as well as
> a technical one.
>
> I'd like to set up Artemis in a high availability configuration, so the
> queue system as a whole keeps working, even if I disable single machines in
> the cluster. I'm familiar with Kafka, which provides this ability via a
> Zookeeper quorum, replication and leader elections.
>
> Going by the documentation at
> https://activemq.apache.org/artemis/docs/latest/ha.html, I get the
> impression that each live server can only have a single backup. Is it
> possible to have more than one backup of the data?
>
> My understanding is that to avoid split brain (
> https://activemq.apache.org/artemis/docs/latest/network-isolation.html),
> I'd need at least 6 servers (3 live, 3 backup). Is this correct, and does
> it only apply to replication HA, or also to shared store HA?
>
> I wanted to try out shared store failover behavior, so I set up two brokers
> locally using the following commands:
>
> ./artemis create --clustered --shared-store --data /mnt/c/artemis-data
> --host localhost --http-port 8161 --failover-on-shutdown --default-port
> 61616 /mnt/c/artemis-master
>
> ./artemis create --clustered --shared-store --data /mnt/c/artemis-data
> --host localhost --http-port 8162 --failover-on-shutdown --default-port
> 61617 --slave /mnt/c/artemis-slave
>
> I can't get the backup to take over when doing this. The log is spammed
> with the following message in both brokers:
>
> 2018-08-01 11:48:33,897 WARN  [org.apache.activemq.artemis.core.client]
> AMQ212034: There are more than one servers on the network broadcasting the
> same node id. You will see this message exactly once (per node) if a node
> is restarted, in which case it can be safely ignored. But if it is logged
> continuously it means you really do have more than one node on the same
> network active concurrently with the same node id. This could occur if you
> have a backup node active at the same time as its live node.
> nodeID=cb201578-9580-11e8-b925-f01faf531f94
>
> Just for the sake of completeness, I tried replacing the
> broadcast/discovery group in broker.xml with static-connector
> configuration, and this gets rid of this warning but the backup still won't
> take over for the master when I kill the master process. The backup broker
> clearly logs that it has lost connection to another server in the cluster,
> but it doesn't seem to take the live role.
>
> Am I misunderstanding how the failover should work, or is there something
> wrong with the configuration?
>