git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Designing for maximum Artemis performance


Sorry to have joined late, but Miroslav and Justin have completely coverred
what I wanted to do: totally agree on everything  +100
I would add one thing: take a look here https://softwaremill.com/mqperf/
Look on how Artemis is being used there: MAPPED journal (with datasync off,
now on 2.6.x), but using replication to reduce the window of
failure/loosing data :)
FYI MAPPED journal with datasync off protect you just against application
failures and considering that you're in a could environment (+ replication
if needed) it could be enough.



Il giorno ven 5 ott 2018 alle ore 08:35 Miroslav Novak <mnovak@xxxxxxxxxx>
ha scritto:

> Hi Tim,
>
> > First, spinning up a replacement host is not instantaneous, so there will
> > be a period of at least a minute but possibly several where the messages
> on
> > that broker and storage volume will simply be unavailable to consumers.
>
> In case you decide to use HA with shared store then it will take some time
> for slave to start as well. It needs to load the journal directory which in
> case that it has a few GBs might take some time and it's the longest part
> when starting the broker. It's best practice to have a good health check on
> master so you can restart new master asap. I don't recommend to use HA with
> replicated journal in cloud environment because network is cloud is usually
> with long latencies, unreliable and hard to configure. It's something what
> is very hard to make robust and fast.
>
> > Second, it means that there is only one copy of a given message within
> the
> > broker cluster, so if that storage volume gets corrupted or fails, you've
> > lost data, which would be unacceptable in some use cases.
>
> Leave fault tolerance and redundancy on storage of your cloud provider. It
> can be backed up by some RAID for fault tolerance and possibly replicate
> your storage to some backup location. As mentioned in case of HA with
> shared store if storage gets corrupted then backup will not start anyway.
> Thus RAID and possibly replication on storage level sounds as better
> option.
>
> Thanks,
> Mirek
>
> ----- Original Message -----
> > From: "Tim Bain" <tbain@xxxxxxxxxxxxxxx>
> > To: "ActiveMQ Users" <users@xxxxxxxxxxxxxxxxxxx>
> > Sent: Thursday, 4 October, 2018 3:01:52 PM
> > Subject: Re: Designing for maximum Artemis performance
> >
> > Justin,
> >
> > That approach will work, to a point, but it has (at least) two failure
> > cases that would be problematic.
> >
> > First, spinning up a replacement host is not instantaneous, so there will
> > be a period of at least a minute but possibly several where the messages
> on
> > that broker and storage volume will simply be unavailable to consumers.
> >
> > Second, it means that there is only one copy of a given message within
> the
> > broker cluster, so if that storage volume gets corrupted or fails, you've
> > lost data, which would be unacceptable in some use cases.
> >
> > There's also be a failure case if the number of hosts was not an even
> > multiple of the number of AZs, where the new host comes up in a different
> > AZ than the storage volume, and therefore can't use it. So you'd need to
> be
> > careful in designing the setup to avoid that potential problem.
> >
> > Overall I think it's better to have a slave host addressing both the
> > availability and data durability concerns than to try to manage reusing
> > storage volumes, but it might depend on the exact requirements for which
> > approach was best.
> >
> > Tim
> >
> > On Wed, Oct 3, 2018, 2:56 PM Justin Bertram <jbertram@xxxxxxxxxx> wrote:
> >
> > > > Would it be desirable for Artemis to support this functionality in
> the
> > > future though, i.e. if we raised it as a feature request?
> > >
> > > All things being equal I'd say probably so, but I suspect the effort to
> > > implement the feature might outweigh the benefits.
> > >
> > > > The cloud can manage spinning up another node, but the problem is
> > > telling/getting the Artemis cluster to make that server the master now.
> > >
> > > The way I imagine it would work best is without any slave at all.  The
> > > whole point of the slave is to take over quickly from a live broker
> that
> > > has failed in such a way that all the data from the failed broker is
> still
> > > available to clients.  Maybe I'm wrong about clouds, but I believe the
> > > cloud itself can provide this functionality by quickly spinning up a
> new
> > > broker when one fails.  So, you would have 3 live brokers in a cluster
> each
> > > with a separate storage node.  There wouldn't be any slaves at all.
> When
> > > one of those brokers fails the cloud will spin up another to replace
> it and
> > > re-attach to the storage node so that any reconnecting client has
> access to
> > > all the data as before just like it would on a slave.  Or is that not
> how
> > > clouds work?
> > >
> > >
> > > Justin
> > >
> > > On Tue, Oct 2, 2018 at 10:50 PM schalmers <
> > > simon.chalmers@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > jbertram wrote
> > > > > The master/slave/slave triplet architecture complicates fail-back
> > > quite a
> > > > > bit and it's not something the broker handles gracefully at this
> point.
> > > > > I'd recommend against using it for that reason.
> > > >
> > > > Would it be desirable for Artemis to support this functionality in
> the
> > > > future though, i.e. if we raised it as a feature request?
> > > >
> > > >
> > > > jbertram wrote
> > > > > To Clebert's point...I also don't understand why you wouldn't let
> the
> > > > > cloud
> > > > > infrastructure deal with spinning up another live node when one
> > > fails.  I
> > > > > was under the impression that's kind of what clouds are for.
> > > >
> > > > The cloud can manage spinning up another node, but the problem is
> > > > telling/getting the Artemis cluster to make that server the master
> now.
> > > > From
> > > > what I've read and been told, there's no way to failback to the
> master
> > > when
> > > > there is already a backup for the (new) master.
> > > >
> > > > That's what I'm looking for help on and were my original questions.
> > > >
> > > > If the position from Artemis is that there's no desire for Artemis to
> > > ever
> > > > work that way, even if we ask/raise a feature request, then we just
> need
> > > to
> > > > understand that so we can make design decisions in our application
> stack
> > > to
> > > > cater for that.
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from:
> > > > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
> > > >
> > >
> >
>