git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Designing for maximum Artemis performance


Hi Tim,

> First, spinning up a replacement host is not instantaneous, so there will
> be a period of at least a minute but possibly several where the messages on
> that broker and storage volume will simply be unavailable to consumers.

In case you decide to use HA with shared store then it will take some time for slave to start as well. It needs to load the journal directory which in case that it has a few GBs might take some time and it's the longest part when starting the broker. It's best practice to have a good health check on master so you can restart new master asap. I don't recommend to use HA with replicated journal in cloud environment because network is cloud is usually with long latencies, unreliable and hard to configure. It's something what is very hard to make robust and fast.

> Second, it means that there is only one copy of a given message within the
> broker cluster, so if that storage volume gets corrupted or fails, you've
> lost data, which would be unacceptable in some use cases.

Leave fault tolerance and redundancy on storage of your cloud provider. It can be backed up by some RAID for fault tolerance and possibly replicate your storage to some backup location. As mentioned in case of HA with shared store if storage gets corrupted then backup will not start anyway. Thus RAID and possibly replication on storage level sounds as better option. 

Thanks,
Mirek

----- Original Message -----
> From: "Tim Bain" <tbain@xxxxxxxxxxxxxxx>
> To: "ActiveMQ Users" <users@xxxxxxxxxxxxxxxxxxx>
> Sent: Thursday, 4 October, 2018 3:01:52 PM
> Subject: Re: Designing for maximum Artemis performance
> 
> Justin,
> 
> That approach will work, to a point, but it has (at least) two failure
> cases that would be problematic.
> 
> First, spinning up a replacement host is not instantaneous, so there will
> be a period of at least a minute but possibly several where the messages on
> that broker and storage volume will simply be unavailable to consumers.
> 
> Second, it means that there is only one copy of a given message within the
> broker cluster, so if that storage volume gets corrupted or fails, you've
> lost data, which would be unacceptable in some use cases.
> 
> There's also be a failure case if the number of hosts was not an even
> multiple of the number of AZs, where the new host comes up in a different
> AZ than the storage volume, and therefore can't use it. So you'd need to be
> careful in designing the setup to avoid that potential problem.
> 
> Overall I think it's better to have a slave host addressing both the
> availability and data durability concerns than to try to manage reusing
> storage volumes, but it might depend on the exact requirements for which
> approach was best.
> 
> Tim
> 
> On Wed, Oct 3, 2018, 2:56 PM Justin Bertram <jbertram@xxxxxxxxxx> wrote:
> 
> > > Would it be desirable for Artemis to support this functionality in the
> > future though, i.e. if we raised it as a feature request?
> >
> > All things being equal I'd say probably so, but I suspect the effort to
> > implement the feature might outweigh the benefits.
> >
> > > The cloud can manage spinning up another node, but the problem is
> > telling/getting the Artemis cluster to make that server the master now.
> >
> > The way I imagine it would work best is without any slave at all.  The
> > whole point of the slave is to take over quickly from a live broker that
> > has failed in such a way that all the data from the failed broker is still
> > available to clients.  Maybe I'm wrong about clouds, but I believe the
> > cloud itself can provide this functionality by quickly spinning up a new
> > broker when one fails.  So, you would have 3 live brokers in a cluster each
> > with a separate storage node.  There wouldn't be any slaves at all.  When
> > one of those brokers fails the cloud will spin up another to replace it and
> > re-attach to the storage node so that any reconnecting client has access to
> > all the data as before just like it would on a slave.  Or is that not how
> > clouds work?
> >
> >
> > Justin
> >
> > On Tue, Oct 2, 2018 at 10:50 PM schalmers <
> > simon.chalmers@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > jbertram wrote
> > > > The master/slave/slave triplet architecture complicates fail-back
> > quite a
> > > > bit and it's not something the broker handles gracefully at this point.
> > > > I'd recommend against using it for that reason.
> > >
> > > Would it be desirable for Artemis to support this functionality in the
> > > future though, i.e. if we raised it as a feature request?
> > >
> > >
> > > jbertram wrote
> > > > To Clebert's point...I also don't understand why you wouldn't let the
> > > > cloud
> > > > infrastructure deal with spinning up another live node when one
> > fails.  I
> > > > was under the impression that's kind of what clouds are for.
> > >
> > > The cloud can manage spinning up another node, but the problem is
> > > telling/getting the Artemis cluster to make that server the master now.
> > > From
> > > what I've read and been told, there's no way to failback to the master
> > when
> > > there is already a backup for the (new) master.
> > >
> > > That's what I'm looking for help on and were my original questions.
> > >
> > > If the position from Artemis is that there's no desire for Artemis to
> > ever
> > > work that way, even if we ask/raise a feature request, then we just need
> > to
> > > understand that so we can make design decisions in our application stack
> > to
> > > cater for that.
> > >
> > >
> > >
> > > --
> > > Sent from:
> > > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
> > >
> >
>